Solved – Why does Tableau’s Box/Whisker plot show outliers automatically and how can I get rid of it

boxplot

I have a data set shown as box-whisker graphs after disaggregating. See below.

enter image description here

I am wondering why Tableau (the product I am using) automatically plots a whole bunch of values outside the box-whisker.
I thought the whiskers of the box are minimums and maximums. It says that the values above the maximum whisker are outliers but I don't see the need to show it and second not sure what logic it uses to calculate it. So just wondering whether anyone knows why someone would want to look at a box-whisker graph which has outliers shown as well rather than them being contained within the box-whisker? (I.e. is this common statistical practice?)

Best Answer

The usual (and original) definition of a box and whisker plot does include outliers (indeed, Tukey had two kinds of outlying points, which these days are often not distinguished).

Specifically, the ends of the whiskers in the Tukey boxplot go at the nearest observations inside the inner fences, which are generally at the upper hinge + 1.5 H-spreads and lower hinge - 1.5 H-spreads (basically, UQ + 1.5 IQR and LQ - 1.5 IQR). What's outside those is marked as outliers.

That's what R does, for example:

boxplot of stopping distances

There are many variations on the box plot, and some packages implement other things than the Tukey boxplot, but it's the most common one. Indeed, Wickham & Stryjewski's "40 years of boxplots" mentions numerous variations (and that's only a fraction of what can be found out there).

See Wikipedia's article on the box plot for some basic details.

Incidentally, Tableau isn't just showing outliers - it's showing all the data there. You can see it's marking points between the ends of the whiskers, and even points inside the boxes, not just the ones outside the inner fences.

Tableau describes its boxplots here; as you see the description broadly matches what I describe for Tukey boxplots above.


Edit: This is just to add a drawing of what the boxplot elements look like in the Schmid and Crowe references mentioned in comments so people don't have to chase them down to see what was being discussed:

enter image description here

(the Crowe version is slightly tweaked here in a couple of ways, one of which makes it seem a bit more boxplot-like; I may do a more faithful version later)

Related Question