QQ-Plots vs. Histograms – Benefits and Applications Explained

binninghistogramqq-plotreferences

In this comment, Nick Cox wrote:

Binning into classes is an ancient method. While histograms can be useful, modern statistical software makes it easy as well as advisable to fit distributions to the raw data. Binning just throws away detail that is crucial in determining which distributions are plausible.

The context of this comment suggests using QQ-plots as an alternative means to evaluate the fit. The statement sounds very plausible, but I'd like to know about a reliable reference supporting this statement. Is there some paper which does a more thorough investigation of this fact, beyond a simple “well, this sounds obvious”? Any actual systematic comparisons of results or the likes?

I'd also like to see how far this benefit of QQ-plots over histograms can be stretched, to applications other than model fitting. Answers on this question agree that “a QQ-plot […] just tells you that "something is wrong"”. I am thinking about using them as a tool to identify structure in observed data as compared to a null model and wonder whether there exist any established procedures to use QQ-plots (or their underlying data) to not only detect but also describe non-random structure in the observed data. References which include this direction would therefore be particularly useful.

Best Answer

The canonical paper here was:

  • Wilk, M.B. and R. Gnanadesikan. 1968. Probability plotting methods for the analysis of data. Biometrika 55: 1-17

and it still repays close and repeated reading. A lucid treatment with many good examples was given by:

  • Cleveland, W.S. 1993. Visualizing Data. Summit, NJ: Hobart Press.

and it is worth mentioning the more introductory:

  • Cleveland, W.S. 1994. The Elements of Graphing Data. Summit, NJ: Hobart Press.

Other texts containing reasonable exposure to this approach include:

  • Davison, A.C. 2003. Statistical Models. Cambridge: Cambridge University Press.
  • Rice, J.A. 2007. Mathematical Statistics and Data Analysis. Belmont, CA: Duxbury.

That aside, I don't know of anything that is quite what you ask. Once you have seen the point of quantile-quantile plots, showing in detail that histograms are a second-rate alternative seems neither interesting nor useful, too much like shooting fish in a barrel.

But I would summarize like this:

  1. Binning suppresses details, and the details are often important. This can apply not only to exactly what is going on in the tails but also to what is going on in the middle. For example, granularity or multimodality may be important as well as skewness or tail weight.

  2. Binning requires decisions about bin origin and bin width, which can affect the appearance of histograms mightily, so it is hard to see what is real and what is a side-effect of choices. If your software makes these decisions for you, the problems remain. (For example, default bin choices are often designed so that you do not use "too many bins", i.e. with the motive of smoothing a little.)

  3. The graphical and psychological problem of comparing two histograms is trickier than that of judging the fit of a set of points to a straight line.

[Added 27 Sept 2017] 4. Quantile plots can be varied very easily when considering one or more transformed scales. By transformation here I mean a nonlinear transformation, not e.g. scaling by a maximum or standardisation by (value $-$ mean) / SD. If the quantiles are just the order statistics, then all you need to do is to apply the transformation, as e.g. the logarithm of the maximum is identically the maximum of the logarithms, and so forth. (Trivially, reciprocation reverses order.) Even if you plot selected quantiles that are based on two order statistics, usually they are just interpolated between two original data values and the effect of the interpolation is usually minor. In contrast, histograms on log or other transformed scales require a fresh decision on bin origin and width that isn't especially difficult, but it can be awkward. Much the same can be said of density estimation as a way to summarize the distribution. Naturally, whatever transformation you apply must make sense for the data, so that logarithms can only usefully be applied for a positive variable.

Related Question