Solved – Raw residuals versus standardised residuals versus studentised residuals – what to use when

goodness of fitresiduals

This looks like a similar question and didn't get many responses.

Omitting tests such as Cook's D, and just looking at residuals as a group, I am interested in how others use residuals when assessing goodness-of-fit. I use the raw residuals:

  1. in a QQ-plot, for assessing normality
  2. in a scatterplot of $y$ versus residuals, for eyeball check of (a) hetereoscedasticity and (b) serial autocorrelation.

For plotting $y$ versus residuals to examine the values for $y$ where outliers may occur, I prefer to use the studentized residuals. The reason for my preference is that it allows easy viewing of which residuals at which $y$-values are problematic, although standardised residuals provide an extremely similar result. My theory on which is used is that it depends on which university one went to.

Is this similar to how others use residuals? Do others use this number of graphs in combination with summary statistics?

Best Answer

This isn't so much an answer as a clarification on terminology. Your question asks about raw, standarized, and studentized residuals. However, this is not the terminology used by most statisticians, though I note your class notes state that it is.

Raw: same as you have it.

Standardized: this is actually the raw residuals divided by the true standard deviation of the residuals. As the true standard deviation is rarely known, a standardized residual is almost never used.

Internally Studentized: because the true standard deviation of the residuals is not typically known, the estimated standard deviation is used instead. This is an interanlly studentized residual, and it is what you called standardized.

Externally Studentized: the same as the internally studentized residual, except that the estimate of the standard deviation of the residuals is calcuated from a regression leaving out the observation in question.

Pearson: the raw residual divided by the standard deviation of the response variable (the y variable) rather than of the residuals. You don't have this one listed.

"leave one out": Doesn't have a formal name, but it is the same as the class notes.

standarized "leave one out": also doesn't have a formal name, but this is not what the class notes call studentized.

Sources:

  1. the same wiki link you have about studentized residuals ("a studentized residual is the quotient resulting from the division of a residual by an estimate of its standard deviation")

  2. documentation for residual calculation in SAS