Solved – R: test normality of residuals of linear model – which residuals to use

lmnormality-assumptionrregressionresiduals

I would like to do a Shapiro Wilk's W test and Kolmogorov-Smirnov test on the residuals of a linear model to check for normality. I was just wondering what residuals should be used for this – the raw residuals, the Pearson residuals, studentized residuals or standardized residuals? For a Shapiro-Wilk's W test it appears that the results for the raw & Pearson residuals are identical but not for the others.

fit=lm(mpg ~ 1 + hp + wt, data=mtcars)
res1=residuals(fit,type="response")
res2=residuals(fit,type="pearson")
res3=rstudent(fit)
res4=rstandard(fit)
shapiro.test(res1) # W = 0.9279, p-value = 0.03427
shapiro.test(res2) # W = 0.9279, p-value = 0.03427
shapiro.test(res3) # W = 0.9058, p-value = 0.008722
shapiro.test(res4) # W = 0.9205, p-value = 0.02143

Same question for K-S, and also whether the residuals should be tested against a normal distribution (pnorm) as in

ks.test(res1, "pnorm") # D = 0.296, p-value = 0.005563

or a t-student distribution with n-k-2 degrees of freedom, as in

ks.test(res3, "pt",df=nrow(mtcars)-2-2) 

Any advice perhaps? Also, what are recommended values for the test statistics W (>0.9?) and D in order for the distribution to be sufficiently close to normality and not affect your inference too much?

Finally, does this approach take into account the uncertainty in the fitted lm coefficients, or would function cumres() in package gof() be better in this respect?

cheers,
Tom

Best Answer

Grew too long for a comment.

  1. For an ordinary regression model (such as would be fitted by lm), there's no distinction between the first two residual types you consider; type="pearson" is relevant for non-Gaussian GLMs, but is the same as response for gaussian models.

  2. The observations you apply your tests to (some form of residuals) aren't independent, so the usual statistics don't have the correct distribution. Further, strictly speaking, none of the residuals you consider will be exactly normal, since your data will never be exactly normal. [Formal testing answers the wrong question - a more relevant question would be 'how much will this non-normality impact my inference?', a question not answered by the usual goodness of fit hypothesis testing.]

  3. Even if your data were to be exactly normal, neither the third nor the fourth kind of residual would be exactly normal. Nevertheless it's much more common for people to examine those (say by QQ plots) than the raw residuals.

  4. You could overcome some of the issues in 2. and 3. (dependence in residuals as well as non-normality in standardized residuals) by simulation conditional on your design matrix ($\mathbf{X}$), meaning you could use whichever residuals you like (however you can't deal with the "answering an unhelpful question you already know the answer to" problem that way).

Related Question