Generating a boxplot in ArgonStudio
A boxplot (or a box and whisker plot) is a standard way of representing distribution of numerical data using a summary of five numbers:
- the minimum: the lower bound or the smallest value of the set of numbers.
- first quartile (Q1): the 25th percentile value or the median of the lower half of the data set.
- median (or Q2): the middle value of the data set or the 50th percentile.
- third quartile (Q3): the 75th percentile value is the median of the upper half of the data set.
- the maximum: the upper bound or the largest value in the data set.
Introduction to boxplots
As explained above, a box plot is a visual representation of five values which summarize a dataset with the minimum, first quartile, median, third quartile and the maximum.
Consider the set of the following five numbers.
1, 2, 3, 4, 5
The minimum of the set is 1, the maximum is 5, the median is 3, Q1 is 2 (in-between the minimum and the median) and Q3 is 4 (in-between the median and the maximum).
Here is how you can generate this boxplot in ArgonStudio.
- Load the ArgonStudio editor.
- Click on the Chart tab at the top.
- In the Chart data text box, enter the values as shown in the image above.
- In the chart area below the text box, click on the Boxplot tab.
- Choose the X-column and Y-column as shown and click on Draw.
The boxplot generated shows these values.
Inter-Quartile Range (IQR) and outliers
The Inter-quartile range (IQR) is defined as the
difference between the upper and lower quartiles:
IQR = Q3 - Q1
To illustrate outliers, the minimum is defined to be 1.5 times
the IQR below the first quartile or:
min = Q1 - 1.5 * IQR
Similarly the maximum is defined to be 1.5 times
the IQR above the third quartile or:
max = Q3 + 1.5 * IQR
Using these definitions, outliers are defined as those data points that lie outside these limits.
Let us add an outlier to the above data set so that it
1, 2, 3, 4, 5, 10
Re-generating the box plot shows the outlier point.
United States per-capita income by state
Let us now look at some real-world examples of boxplots. The above chart presents per-capita income in the United States for each state when aggregated by the county. The chart shows the box-and-whisker plot of the income as well as the outliers for each state.
Parts of a boxplot
This image shows the different parts of a boxplot. This image represents the distribution of per-capita income of New York state over each of its counties.
- IQR: or the inter-quartile range is the difference of the third and the first quartile.
- minimum: this is the lowest value in the dataset (not accounting for the outliers) and is computed as less than Q1 by 1.5 times the IQR.
- maximum: this is the highest value in the dataset (not accounting for the outliers) and is computed as more than Q3 by 1.5 times the IQR.
- outliers: these are the data points which lie outside the range of the distribution.
In this article, we covered some basics of boxplots, and how to draw it in ArgonStudio.
- A boxplot shows a representation of the data distribution.
- It includes the dataset maximum and minimum at the extremities of the whiskers.
- The box represents the first quartile (Q1) and the third quartile (Q3) at the box ends, and a line inside the box as the second quartile (Q2) (also known as the median).
- Any outliers are shown as points outside the whiskers.