Solved – Confused by location of fences in box-whisker plots

boxplotoutliers

In one type of box-whisker plot, the fences at the ends of the whiskers are meant to indicate cutoff values beyond which any point would be considered an outlier.

The standard definitions I've found for these cutoff values are

$$
q_1 – k \times \mathrm{IQR}
$$
for the lower fence, and
$$
q_3 + k \times \mathrm{IQR}
$$
for the upper one, where $q_1$ and $q_3$ are the first and third quartile, respectively, $\mathrm{IQR} := q_3 – q_1$ is the interquartile range, and $k$ is some constant $ > 0$. (The value of $k$ I've seen most often is 1.5, with 3 being a distant second.)

So far so good.

The problem is that, with these definitions, the distance between the lower fence and $q_1$ would always be the same as the distance between the upper fence and $q_3$, namely $k\times \mathrm{IQR}$. IOW, the length of the upper whisker would always equal the length of the lower one 1.

This does not agree with the vast majority of BW plots I see out there. Of course, for some of these plots the ends of the whiskers are supposed to represent the min and max values, so the comments above do not apply to them. But there are many other cases in which the fences are meant to denote the criterion for classifying points as outliers, and are supposedly based on formulae like the ones shown above, but nonetheless the resulting whiskers have different lengths. (For example.)

What am I missing?


1 By "length of the upper/lower whisker" I mean, of course, the distance between the point where the whisker meets the box and the whisker's "free" end-point.

Best Answer

The whisker only goes as far as the maximum (minimum) point less (greater) than the upper (lower) fence value. For example, if $q_3+k \times IQR=10$ and the data set had values $\lbrace\dots,5,6,7,8,12\rbrace$, then the whisker would only goes as far as 8, and 12 would be the "outlier".

So, in short, the definitions for the whiskers, $q_3 +k \times IQR$ and $q_1-k\times IQR$, only represent the maximum extent to which the whiskers could go, if there were data points at those values. Thus they don't have to be (and rarely are) the same length.