Solved – Outliers for boxplot

boxplotdata visualization

If I have a set of data and I want to display it as a boxplot, when there are some outliers in the boxplot, we indicate outliers with a *. If there are two outliers having the same value, how to put that in the boxplot? Is it still *?

Best Answer

I'll use R in proposing a solution. Let's simulate some data:

set.seed(1)
foo <- c(rnorm(100,0,1),5,5,5,7,7)

I see two possibilities. The first one would be to plot the boxplot and add sunflowerplots of the outliers:

bar <- boxplot(foo)
sunflowerplot(x=rep(1,length(bar$out)),bar$out,seg.col=1,add=TRUE)

sunflowerplotted outliers

The second possibility is to plot the boxplot (which creates a single point for each outlier) and add additional points for additional outliers - which are jittered horizontally (edited as per @chl's excellent suggestion):

bar <- boxplot(foo,plot=FALSE)
boxplot(foo,outline=FALSE,ylim=c(min(c(bar$stats,bar$out)),max(c(bar$stats,bar$out))))
points(jitter(rep(1, length(bar$out))), bar$out)

jittered outliers

Note that the first solution requires that your data are integers (otherwise you will run afoul of floating-point arithmetic, see question 7.31 in the R FAQ - in this case you will need to do some additional work to ensure R knows which floating point numbers should be treated as "equal").