Solved – Normality of residuals – contradiction between ‘symplot’ and ‘qnorm’

diagnosticnormality-assumptionregressionresidualsstata

After running a multiple linear regression analysis, I wanted to assess normality of residuals. I plotted a histogram which showed an almost normal distribution of residuals. I also used symplot and qnorm (in Stata) as additional diagnostic checks of normality. symplot gave the following plot which depicts a right-skewed distribution.

enter image description here

However, qnorm yielded the next plot which shows a distribution very closer to normal.

enter image description here

Under such apparent contradiction, how should I decide about the normality of the residuals? Since the histogram plot (not shown here) and the symmetry plot (symplot) are in support of normality, may I conclude that the residuals are normally distributed?

Best Answer

Note that this has nothing at all to do with residuals as such. It applies generally to looking at any distributions.

The two graphs do not have exactly the same purpose. Be clear that a symmetry plot checks for symmetry or asymmetry and would look simple for many symmetric distributions that were not Gaussian, e.g. t distributions with finite degrees of freedom. But there is still a question of whether the graphs contradict each other.

I here assume familiarity with normal probability plots (historically often so named, although Gaussian quantile-quantile plots is a minority preferred name). See for example this explanation.

However, symmetry plots seem less used and bear some explanation.

Stata's symplot, as the axis titles imply, pairs values above and below the median and plots (largest $-$ median) vs (median $-$ smallest), (second largest $-$ median) vs (median $-$ second smallest), etc. and the reference line is thus (value in upper half $-$ median) $=$ (median $-$ value in lower half), implying symmetry of distribution.

What you can't tell easily from symplot in cases like this is how many values are in the middle, often approximately symmetric part of the distribution and how many in the rest.

It is easy for symplot therefore to impart a pessimistic message because points may be heavily overplotted near the middle of the distribution.

Here is another example. I simulate 95% of values from a Gaussian and 5% of values from a gamma with the same variance (but evidently different skew).

This is the Stata recipe used:

clear 
set obs 10000 
set seed 2803
gen y = cond(_n <= 9500, rnormal(6,10), rgamma(1,10))
symplot y
qnorm y

Loosely, the symplot seems to flag lack of symmetry (and thus lack of normality) more prominently than the normal probability plot (Gaussian quantile-quantile plot) flags lack of Gaussianity.

enter image description here

enter image description here

It's manifestly the same data, but the tail is inevitably more prominent in one graph than another. In addition to the question of overplotting, in a symmetry plot all the bad news is usually lumped together at one end; in a normal probability plot there is often bad news in both tails.