Internally Studentized Residuals – Advantages of Internally Studentized Residuals Over Raw Estimated Residuals

residuals

The reason I ask this is because it seems that internally studentized residuals seem to have the same pattern as raw estimated residuals. It would be great if someone could offer an explanation.

Best Answer

Assume a regression model $\bf{y} = \bf{X} \bf{\beta} + \bf{\epsilon}$ with design matrix $\bf{X}$ (a $\bf{1}$ column followed by your predictors), predictions $\hat{\bf{y}} = \bf{X} (\bf{X}' \bf{X})^{-1} \bf{X}' \bf{y} = \bf{H} \bf{y}$ (where $\bf{H}$ is the "hat-matrix"), and residuals $\bf{e} = \bf{y} - \hat{\bf{y}}$. The regression model assumes that the true errors $\bf{\epsilon}$ all have the same variance (homoskedasticity):

homoskedasticity

The covariance matrix of the residuals is $V(\bf{e}) = \sigma^{2} (\bf{I} - \bf{H})$. This means that the raw residuals $e_{i}$ have different variances $\sigma^{2} (1-h_{ii})$ - the diagonal of the matrix $\sigma^{2} (\bf{I} - \bf{H})$. The diagonal elements of $\bf{H}$ are the hat-values $h_{ii}$.

The truely standardized residuals with variance 1 throughout are thus $\bf{e} / (\sigma \sqrt{1 - h_{ii}})$. The problem is that the error variance $\sigma$ is unknown, and internally / externally studentized residuals $\bf{e} / (\hat{\sigma} \sqrt{1 - h_{ii}})$ result from particular choices for an estimate $\hat{\sigma}$.

Since raw residuals are expected to be heteroskedastic even if the $\epsilon$ are homoskedastic, the raw residuals are theoretically less well suited to diagnose problems with the homoskedasticity assumption than standardized or studentized residuals.

Related Question