R Residuals – Understanding Standardized Residuals in R’s lm Output

diagnosticlmrresidualsstandardization

If I plot the diagnostic plots to an R regression, a couple of them have "Standardized Residuals" as their y-axis such as in this plot:

enter image description here

What are the residuals standardized over? That is, let us assume that in my model, there are 100 predicted values; hence 100 residuals.

  1. Standardized residual $e_i$ is defined as $(e_i – \bar e)/s_e$(realized residual – mean of all 100 realized residuals)/(standard deviation of all 100 realized residuals)?
  2. Since each residual $e_i$ is itself a realized value out of a distribution of possible realizations for this single residual $e_i$, is this residual $e_i$ normalized by its own mean $\bar e_i$ and variance $\text{Var}(e_i)$ (as opposed to the mean and variance from all other values 1 to 100 as described above)?

I tried finding documentation clarifying this distinction but could not find any that was beyond doubt.

Best Answer

If you look at the code for plot.lm (by typing stats:::plot.lm), you see these snippets in there (the comments are mine; they're not in the original):

r <- residuals(x)                                # <---  r contains residuals

...

if (any(show[2L:6L])) {
    s <- if (inherits(x, "rlm")) 
        x$s
    else if (isGlm) 
        sqrt(summary(x)$dispersion)   
    else sqrt(deviance(x)/df.residual(x))        #<---- value of s
    hii <- lm.influence(x, do.coef = FALSE)$hat  #<---- value of hii

...

    r.w <- if (is.null(w)) 
        r                                        #<-- r.w  for unweighted regression
    else sqrt(w) * r
    rs <- dropInf(r.w/(s * sqrt(1 - hii)), hii)  # <-- std. residual in plots

So - if you don't use weights - the code clearly defines its standardized residuals to be the internally studentized residuals defined here:

http://en.wikipedia.org/wiki/Studentized_residual#How_to_studentize

which is to say:

$${\widehat{\varepsilon}_i\over \widehat{\sigma} \sqrt{1-h_{ii}\ }}$$

(where $\widehat{\sigma}^2={1 \over n-m}\sum_{j=1}^n \widehat{\varepsilon}_j^{\,2}$, and $m$ is the column dimension of $X$).

Related Question