Solved – Is studentized residuals v/s standardized residuals in lm model

rregressionresidualsterminology

Are "studentized residuals" and "standardized residuals" the same in regression models? I built a linear regression model in R and wanted to plot the graph of Studentized residuals v/s fitted values, but didn't find an automated way of doing this in R.

Suppose I have a model

library(MASS)

lm.fit <- lm(Boston$medv~(Boston$lstat))

then using plot(lm.fit) does not provide any plot of Studentized residuals vs. fitted values but yet it provides plot of Standardized residuals vs. fitted values.

I used plot(lm.fit$fitted.values,studres(lm.fit) and it will plot the desired graph.So just want to confirm that am i going the right way and Studentized and Standardized residuals aren't the same thing. If they are different then please provide some guide to calculate them and their definitions. I searched through the net and found it bit confusing.

Best Answer

No, studentized residuals and standardized residuals are different (but related) concepts.

R in fact does provide built-in functions rstandard() and rstudent() as as part of influence.measures. The same built-in package provides many similar functions for leverage, Cook's distance, etc. rstudent() is essentially the same as MASS::studres(), which you can check for yourself like so:

> all.equal(MASS::studres(model), rstudent(model))
[1] TRUE

Standardized residuals are a way of estimating the error for a particular data point which takes into account the leverage/influence of the point. These are sometimes called "internally studentized residuals."

$$r_{i}=\frac{e_{i}}{s(e_{i})}=\frac{e_{i}}{\sqrt{MSE(1-h_{ii})}}$$

The motivation behind standardized residuals is that even though our model assumed homoscedasticity with an i.i.d. error term with fixed variance $\epsilon_i \sim \mathbb{N}(0, \sigma^2)$, the distribution, the residuals $e_i$ cannot be i.i.d. because the sum of residuals is always exactly zero.

Studentized residuals for any given data point are calculated from a model fit to every other data point except the one in question. These is variously called the "externally studentized residuals", "deleted residuals," or "jackknifed residuals".

This sounds computationally difficult (it sounds like we'd have to fit one new model for every point) but in fact there's a way to compute it from just the original model without refitting. If the standardized residual is $r_i$, then the studentized residual $t_i$ is:

$$t_i=r_i \left( \frac{n-k-2}{n-k-1-r_{i}^{2}}\right) ^{1/2},$$

The motivation behind studentized residuals comes from their use in outlier testing. If we suspect a point is an outlier, then it was not generated from the assumed model, by definition. Therefore it would be a mistake - a violation of assumptions - to include that outlier in the fitting of the model. Studentized residuals are widely used in practical outlier detection.

Studentized residuals also have the desirable property that for each data point, the distribution of the residual will Student's t-distribution, assuming the normality assumptions of the original regression model were met. (Standardized residuals do not have so nice a distribution.)

Lastly, to address any concerns that the R library may be following nomenclature different than above, the R documentation explicitly states that they use "standardized" and "studentized" in the exact same sense described above.

Functions rstandard and rstudent give the standardized and Studentized residuals respectively. (These re-normalize the residuals to have unit variance, using an overall and leave-one-out measure of the error variance respectively.)