Solved – Assessing fit with identity line in Q-Q plot

qq-plotregression coefficients

I'm using a QQ plot to asses the similarity between two observed distributions. I need a number that quantifies how much the quantile-quantile plot separates itself (is there a better word for this concept?) from the identity line y=x (IL).

The R^2 coefficient is of no use since it compares the points with the best fit line, not the IL.

I was thinking of simply summing the absolute distances of each quantile to the identity line so that the larger this number the more I can say they are separated from the IL.

Is there a better/recommended way to go about this? I'm just looking for a number that can be used to tell at a glance how far away my quantiles are from the IL.

Example of one of my QQ plots in graph below. In this example, I should get a somewhat extreme (whether it is high or low) value of whatever statistic I end up using (the data set for the plot can be downloaded here).

enter image description here

Best Answer

The question underlines a simple fact often overlooked even in statistical circles: for two variables, correlation (and its square) quantify linearity of relationship (does $y = a + bx$?), not agreement (does $y = x$?). A trivial example is that an exact proportionality $y = bx$ implies correlation of 1 for any $b > 0$, but as $b$ moves away from 1 agreement (with ideal as equality of corresponding values) certainly diminishes.

Comparison of distributions quantile to quantile, i.e. comparing corresponding order statistics, imparts one or two twists to comparisons that might be made between paired variables. The first twist is required if two distributions are being compared through different-sized samples, say sizes of $m, n, m < n$. Then the larger set would usually be reduced to $m$ by interpolation. The second twist is always present: sets of quantiles are weakly increasing, so measures of agreement should be interpreted in that light of the constraint that implies.

Concordance correlation (a measure most associated with L. I-K. Lin, but invented earlier) is a correlation-like measure of agreement. If (and only if) two variables have identical values, the concordance correlation will be 1. Note that the question makes no sense unless the variables have the same units of measurement or more generally are recorded in the same way. The concordance correlation can be factored into the usual Pearson correlation and a bias correction. Conversely, Pearson correlation and concordance correlation both return 1 when $y = x$ and departures from that condition will reduce concordance correlation even if they don't reduce Pearson correlation.

For an informal introduction to this area, see

Cox, N.J. 2006. Assessing agreement of measurements and predictions in geomorphology. Geomorphology 76: 332-346. http://www.sciencedirect.com/science/article/pii/S0169555X05003740

Here "in geomorphology" indicates the field of the examples, not a restriction of statistical scope. However, use of concordance correlation for paired quantiles is not covered there.

Using the data for the example above these calculations in Stata show what is possible. Stata users would need to download concord after identifying locations using search concord, sj. A complete explanation is not given here. The main point for this example is that concordance correlation at 0.259 is quite different from the Pearson correlation at 0.989. (Incidentally, that implies $R^2$ of 0.978.) No adjustments are made here for the effects of monotonicity constraints on confidence or significance results.

. concord y x

Concordance correlation coefficient (Lin, 1989, 2000):

 rho_c   SE(rho_c)   Obs    [   95% CI   ]     P        CI type
---------------------------------------------------------------
 0.259     0.027      96     0.205  0.312    0.000   asymptotic
                             0.204  0.311    0.000  z-transform

Pearson's r =  0.989  Pr(r = 0) = 0.000  C_b = rho_c/r =  0.262
Reduced major axis:   Slope =     3.133   Intercept =     0.008

Difference = y - x

      Difference                 95% Limits Of Agreement
 Average     Std Dev.             (Bland & Altman, 1986)
---------------------------------------------------------------
  0.286       0.170                 -0.048      0.620

Correlation between difference and mean = 0.994

Bradley-Blackwood F =  1.6e+04 (P = 0.00000)