Solved – name for a scatter plot which compares predicted vs observed values

data visualizationpredictive-modelsqq-plotscatterplotterminology

I have a scatter plot which compares predicted vs observed values. What is the appropriate name for this type of plot?

To elaborate, I have a set of measurements for a specific engineering quantity at several moments in time. I also have a model which predicts the value of that quantity at those same moments in time. To illustrate the quality of the model, I am plotting each (prediction, measurement) pair as a point in a scatter plot, with the prediction on the $X$ axis and the measurement on the $Y$ axis. I'm also drawing the line $y=x$ on the chart as a reference. (Points on that line represent perfect agreement between prediction and measurement.)

I have repeatedly heard this type of plot referred to as a "Q-Q plot". According to the definition of a Q-Q plot given on Wikipedia, this is technically incorrect because I'm not plotting quantiles. I am trying to determine if popular usage of the term "Q-Q plot" in situations like mine has made that the de facto name for this type of plot. If not, then what is an appropriate name for this type of plot?

Best Answer

A scatter plot of observed and predicted is emphatically not a quantile-quantile plot (which defines a never-decreasing sequence of points).

People often just talk informally in terms of what is on which axis, say observed versus or against predicted or fitted (e.g. Chambers et al. 1983).

I'd suggest that plotting observed on the vertical or $y$ axis and predicted or fitted on the horizontal or $x$ axis is marginally preferable to the opposite convention for two reasons:

  1. Plotting response or outcome variable on the vertical axis is a common convention throughout science.

  2. This matches the very common convention of plotting residuals on the vertical axis and predicted or fitted on the horizontal axis in a very common associated plot. (Plots of observed versus fitted and of residual versus fitted show the same information; the first conveys the good news and can be easier to think of substantively, while the second conveys the bad news and can be easier to think of statistically, particularly when considering whether a model is adequate or can be improved.)

On which is the right way round with versus, see discussion at versus (vs.): how to properly use this word in data analysis

A more formal name is calibration plot (e.g. Harrell 2001, 2015; Venables and Ripley 2002; Gelman and Hill 2007).

Chambers, J.M., Cleveland, W.S., Kleiner, B. and Tukey, P.A. 1983. Graphical Methods for Data Analysis. Belmont, CA: Wadsworth.

Gelman, A. and J. Hill. 2007. Data Analysis Using Regression and Multilevel/Hierarchical Models. New York: Cambridge University Press.

Harrell Jr., F.E. 2001. Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. New York: Springer.

Harrell Jr., F.E. 2015. Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis. Cham: Springer.

Venables, W.N. and Ripley, B.D. 2002. Modern Applied Statistics with S. New York: Springer.

Related Question