Solved – What are some alternatives to a boxplot

boxplotdata visualizationdistributions

I am working on creating a website, which displays the census data for a user selected Polygons & would like to graphically show the distribution of various parameters (one graph per parameter).

The data usually has the following properties:

  1. The sample size tend to be large (say around 10,000 data points)
  2. The range in values tends to be quire large (for example, the minimum population can be less than 100 & the maximum can be something like 500,000)
  3. q1 usually is close to the minimum (say 200) while q2 & q3 will be within 10,000
  4. It doesn't look anything like a normal distribution

I am not a statistician and hence my description might not be exactly clear.

I would like to show this distribution on a graph, which will be seen by citizens (the layman, if you like).

I would have best liked to use a histogram, but it is not possible due to the large range of values, due to which making bins is not really easy & straight forward.

From what little I know about statistics, a box plot is what is often used to show this kind of data, but I feel that for a layperson, deciphering the Box plot is not easy.

What are my options to show this data in an easy to understand manner?

Best Answer

A boxplot isn't that complicated. After all, you just need to compute the three quartiles, and the min and max which define the range; a subtlety arises when we want to draw the whiskers and various methods have been proposed. For instance, in a Tukey boxplot values outside 1.5 times the inter-quartile from the first or third quartile would be considered as outliers and displayed as simple points. See also Methods for Presenting Statistical Information: The Box Plot for a good overview, by Kristin Potter. The R software implements a slightly different rule but the source code is available if you want to study it (see the boxplot() and boxplot.stats() functions). However, it is not very useful when the interest is in identifying outliers from a very skewed distribution (but see, An adjusted boxplot for skewed distributions, by Hubert and Vandervieren, CSDA 2008 52(12)).

As far as online visualization is concerned, I would suggest taking a look at Protovis which is a plugin-free js toolbox for interactive web displays. The examples page has very illustrations of what can be achieved with it, in very few lines.