Statistics – Calculating Variance of Error Term in Least Squares Method

linear regressionstatistical-inferencestatistics

We have

$$y_i = \beta_0 + \beta_1x_i + \epsilon_i$$ and $$ \hat{y_i} = \hat{\beta_0} + \hat{\beta_1}x_i $$

where $\epsilon_i \sim N(0, \sigma^2)$.

Let $$e_i = y_i – \hat{y_i} $$

I showed that

$$E(e_i) = E(\beta_0 + \beta_1x_i + \epsilon_i – \hat{\beta_0} – \hat{\beta_1}x_i) = \beta_0 + \beta_1x_i – \beta_0 – \beta_1x_i = 0$$

I want to show further that $V(e_i) = \sigma^2$ but I am facing problem in proving this

\begin{align}
E(e_i^2) &= E((y_i – \hat{y_i})^2 )= E(y_i^2) + E(\hat{y_i}^2) – 2E(y_i\hat{y_i}) \\&= V(y_i) + E(y_i)^2 + V(\hat{y_i}) + E(\hat{y_i})^2 – 2E((\beta_0 + \beta_1x_i + \epsilon_i)(\hat{\beta_0} + \hat{\beta_1}x_i))
\end{align}

I can see from here that I will get stuck calculating $E(\epsilon_i \hat{\beta_0})$ or $E(\epsilon_i \hat{\beta_1})$.

Best Answer

The answer is $$ \operatorname{Var}(e_i) = \sigma^2\left(1-\frac1n-\frac{(x_i-\bar x)^2}{\text{SSX}}\right), $$ where SSX is shorthand for $\sum(x_i-\bar x)^2$.

The derivation is quite involved. Here is one approach. We require the formula for the variance of the difference of two random variables: $$ \operatorname{Var}(A-B)=\operatorname{Var}(A) + \operatorname{Var}(B) - 2\operatorname{Cov}(A,B).\tag{*} $$

  1. Write the $i$th residual in the form $$ e_i:= y_i-\hat y_i = (\epsilon_i-\bar\epsilon)-(\hat\beta_1-\beta_1)(x_i-\bar x)\tag1$$ by plugging in the definitions for $y_i$ and $\hat y_i$ into $y_i-\hat y_i$, and then substituting $\hat \beta_0:=\bar y - \hat\beta_1 \bar x$.

  2. Applying (*) to (1), the desired variance is $$\operatorname{Var}(e_i) = \operatorname{Var}(\epsilon_i-\bar \epsilon) + (x_i-\bar x)^2\operatorname{Var}(\hat\beta_1-\beta_1)-2(x_i-\bar x)\operatorname{Cov}(\epsilon_i-\bar\epsilon, \hat\beta_1-\beta_1).\tag2$$

  3. Using (*), calculate $$\operatorname{Var}(\epsilon_i-\bar\epsilon)=\operatorname{Var}(\epsilon_i) + \operatorname{Var}(\bar\epsilon) - 2\operatorname{Cov}(\epsilon_i,\bar\epsilon)=\sigma^2\left(1-\frac1n\right).\tag3$$ The tricky calculation is $\operatorname{Cov}(\epsilon_i,\bar\epsilon)$, which requires you to observe that $\epsilon_i$ is independent of $\epsilon_k$ when $k\ne i$.

  4. The variance of $\hat\beta_1-\beta_1$ is well known to be $$\operatorname{Var}(\hat\beta_1-\beta_1)=\frac{\sigma^2}{\text{SSX}}.\tag4$$

  5. The covariance in (2) reduces to $E(\epsilon_i-\bar\epsilon)(\hat\beta_1-\beta_1)$, since $E(\hat\beta_1)=\beta_1$. Substitute the formula $$\hat\beta_1-\beta_1=\frac{\sum_k(x_k-\bar x)(\epsilon_k-\bar\epsilon)}{\text{SSX}}\tag 5$$ to obtain $$(\epsilon_i-\bar\epsilon)(\hat\beta_1-\beta_1)=\frac{\sum_k(x_k-\bar x)(\epsilon_i-\bar\epsilon)(\epsilon_k-\bar\epsilon)}{\text{SSX}}.\tag6 $$ Break up the sum in (6) into $\sum_{k=i} + \sum_{k\ne i}$ and take expectations. The answer will be $$\operatorname{Cov}(\epsilon_i-\bar\epsilon, \hat\beta_1-\beta_1)=E(\epsilon_i-\bar\epsilon)(\hat\beta_1-\beta_1)=\frac{(x_i-\bar x)\sigma^2}{\text{SSX}}.\tag 7$$