After running a multiple linear regression analysis, I wanted to assess normality of residuals. I plotted a histogram which showed an almost normal distribution of residuals. I also used symplot
and qnorm
(in Stata) as additional diagnostic checks of normality. symplot
gave the following plot which depicts a right-skewed distribution.
However, qnorm
yielded the next plot which shows a distribution very closer to normal.
Under such apparent contradiction, how should I decide about the normality of the residuals? Since the histogram plot (not shown here) and the symmetry plot (symplot
) are in support of normality, may I conclude that the residuals are normally distributed?
Best Answer
Note that this has nothing at all to do with residuals as such. It applies generally to looking at any distributions.
The two graphs do not have exactly the same purpose. Be clear that a symmetry plot checks for symmetry or asymmetry and would look simple for many symmetric distributions that were not Gaussian, e.g. t distributions with finite degrees of freedom. But there is still a question of whether the graphs contradict each other.
I here assume familiarity with normal probability plots (historically often so named, although Gaussian quantile-quantile plots is a minority preferred name). See for example this explanation.
However, symmetry plots seem less used and bear some explanation.
Stata's
symplot
, as the axis titles imply, pairs values above and below the median and plots (largest $-$ median) vs (median $-$ smallest), (second largest $-$ median) vs (median $-$ second smallest), etc. and the reference line is thus (value in upper half $-$ median) $=$ (median $-$ value in lower half), implying symmetry of distribution.What you can't tell easily from
symplot
in cases like this is how many values are in the middle, often approximately symmetric part of the distribution and how many in the rest.It is easy for
symplot
therefore to impart a pessimistic message because points may be heavily overplotted near the middle of the distribution.Here is another example. I simulate 95% of values from a Gaussian and 5% of values from a gamma with the same variance (but evidently different skew).
This is the Stata recipe used:
Loosely, the
symplot
seems to flag lack of symmetry (and thus lack of normality) more prominently than the normal probability plot (Gaussian quantile-quantile plot) flags lack of Gaussianity.It's manifestly the same data, but the tail is inevitably more prominent in one graph than another. In addition to the question of overplotting, in a symmetry plot all the bad news is usually lumped together at one end; in a normal probability plot there is often bad news in both tails.