Regression – How to Calculate the Standard Error of Residuals

regression

I'm not clear on residuals in regression having their own standard error.

At first I thought "standard error of residual" as referred to in standardizing a set of residuals meant the standard error of all the observed residuals. Then I learned that it was referring to the residual itself and its probability distribution.

Apparently residuals can be standardized by the following:

$$\frac{e_i}{s \sqrt{1-h_{ii}}}$$

With the $h$ term relating to the diagonal terms of the hat matrix. So is $s$ the standard deviation of the observed residuals? And how does this denominator access the probability distribution of the individual residual?

Best Answer

Consider a classical regression model : $$Y_i = \beta^T x_i + \varepsilon_i$$ where $\beta\in\mathbb{R}^p$, the $X_i$ are $n$ vectors of $\mathbb{R}^p$ and the $\varepsilon_i$ are $n$ i.i.d. normal random variables with expectation $0$ and standard deviation $\sigma$.

The ordinary least square estimator is given by: $$\hat\beta = (\sum x_i x_i^T )^{-1} \sum x_i y_i = (X^T X)^{-1}X^T Y$$ where $X$ is a $p \times n$ matrix whose lines are the $X_i$ and $Y = (Y_1, ..., Y_n)$ is a random vector of $\mathbb{R}^p$.

The predicted values $\hat Y_i$ are given by $$\hat Y_i = \hat \beta^T x_i .$$ If we stack these predicted values in a vector $\hat Y$, it can be expressed as $$\hat Y= X \hat \beta Y = X (X^TX)^{-1}X^T Y = HY$$ where $H = X (X^TX)^{-1}X^T$.

Now we define residuals by $$R_i = Y_i- \hat Y_i .$$ The residual $R_i$ can be view as estimate of the error $\varepsilon_i$ since $\varepsilon_i = Y_i - \beta X_i$ and $R_i = Y_i - \hat \beta X_i$.

The vector of residuals $R$ is expressed as: $$R = Y - \hat Y = Y - HY = (I - H) Y .$$

Now you can easily get the covariance matrix of the residual vector $R$. As $var(Y) = \sigma^2 I$, you get that $$var(R) = (I - H) \sigma^2 I (I- H)^T = \sigma^2 (I - H)$$ using the fact that for any fixed matrix $A$ and random variable $Z$, $var(A Z) = A^T var(Z) A$ and the fact that $(I - H)(I - H)^T = (I - H)$.

The variance of the residual $R_i$ isn't $\sigma^2$ but $\sigma^2(1 - H_{ii})$ (the diagonal term of its variance-covariance matrix) hence standardizing it by $\sigma \sqrt{1 - h_{ii}}$ should make it standard.

But then, you may think : "Ok, the $R_i$ should be standardized by $\sigma \sqrt{1 - h_{ii}}$, but $\sigma^2$ is unknown... If I use $S^2$ instead (sample variance of the residuals) I shouldn't be correcting with the $1 - h_{ii}$ term, since $S^2$ is the sample variance of the residuals and not the errors! After all, sample variance should be a consistent estimator of variance. "

But no. You should still stansradize by $S\sqrt{1 - h_{ii}}$ and not only $S$. Sample variance is a consistent estimator of variance only when applied to independent samples. And here, the $R_i$ are not independent since the $I - H$ matrix is not diagonal.

Actually, if the $\varepsilon_i$ are normally distributed, the Cochran theorem implies that $$\frac{1}{\sigma^2}\sum (R_i - \overline R)^2 \sim {\chi^2}_{n-p},$$ hence the sample variance has an expected value of $\frac{n-p}{n}\sigma^2 \to _{n\to \infty} \sigma^2$.

So the sample variance of the residual isn't a consistent estimator of the variance of the residuals (because the residuals are not independent) but is a consistent estimator of the variance of the error $\sigma^2$, hence the standardization by $S\sqrt{1 - h_{ii}}$.

I hope this could help.

Related Question