Generating a boxplot in ArgonStudio

A boxplot (or a box and whisker plot) is a standard way of representing distribution of numerical data using a summary of five numbers:

  • the minimum: the lower bound or the smallest value of the set of numbers.
  • first quartile (Q1): the 25th percentile value or the median of the lower half of the data set.
  • median (or Q2): the middle value of the data set or the 50th percentile.
  • third quartile (Q3): the 75th percentile value is the median of the upper half of the data set.
  • the maximum: the upper bound or the largest value in the data set.

Introduction to boxplots

As explained above, a box plot is a visual representation of five values which summarize a dataset with the minimum, first quartile, median, third quartile and the maximum.

Consider the set of the following five numbers.

1, 2, 3, 4, 5

The minimum of the set is 1, the maximum is 5, the median is 3, Q1 is 2 (in-between the minimum and the median) and Q3 is 4 (in-between the median and the maximum).

Here is how you can generate this boxplot in ArgonStudio.

  • Load the ArgonStudio editor.
  • Click on the Chart tab at the top.
  • In the Chart data text box, enter the values as shown in the image above.
  • In the chart area below the text box, click on the Boxplot tab.
  • Choose the X-column and Y-column as shown and click on Draw.

The boxplot generated shows these values.

Inter-Quartile Range (IQR) and outliers

The Inter-quartile range (IQR) is defined as the difference between the upper and lower quartiles:

IQR = Q3 - Q1

To illustrate outliers, the minimum is defined to be 1.5 times the IQR below the first quartile or:

min = Q1 - 1.5 * IQR

Similarly the maximum is defined to be 1.5 times the IQR above the third quartile or:

max = Q3 + 1.5 * IQR

Using these definitions, outliers are defined as those data points that lie outside these limits.

Let us add an outlier to the above data set so that it becomes:

1, 2, 3, 4, 5, 10

Re-generating the box plot shows the outlier point.

United States per-capita income by state

Let us now look at some real-world examples of boxplots. The above chart presents per-capita income in the United States for each state when aggregated by the county. The chart shows the box-and-whisker plot of the income as well as the outliers for each state.

Parts of a boxplot

This image shows the different parts of a boxplot. This image represents the distribution of per-capita income of New York state over each of its counties.

  • IQR: or the inter-quartile range is the difference of the third and the first quartile.
  • minimum: this is the lowest value in the dataset (not accounting for the outliers) and is computed as less than Q1 by 1.5 times the IQR.
  • maximum: this is the highest value in the dataset (not accounting for the outliers) and is computed as more than Q3 by 1.5 times the IQR.
  • outliers: these are the data points which lie outside the range of the distribution.

Summary

In this article, we covered some basics of boxplots, and how to draw it in ArgonStudio.

  • A boxplot shows a representation of the data distribution.
  • It includes the dataset maximum and minimum at the extremities of the whiskers.
  • The box represents the first quartile (Q1) and the third quartile (Q3) at the box ends, and a line inside the box as the second quartile (Q2) (also known as the median).
  • Any outliers are shown as points outside the whiskers.