Regression – Proving $\text{Var}{(\hat{y}_h)} = \sigma^2 \left(\frac{1}{n} + \frac{(x_h-\bar{x})^2}{S_{xx}}\right)$

proofregressionself-study

I have asked in another question how $\text{Var}{(\hat{y}_h)} = \sigma^2 \left(\frac{1}{n} + \frac{(x_h-\bar{x})^2}{S_{xx}}\right)$. Note that $\hat{y}_h$ = $b_0 + b_1X_h$ which is a regression line estimate at some given $X_h$.

This question concerns why the term $Cov(b_0,b_1)$ alone yields the RHS. Substituting $b_0 = Y – b_1X$ we get that $Cov(Y,b_1) – XCov(b_1,b_1)$ = $Cov(\frac{\sum{Y_i}}{n},\sum k_iY_i) – XVar{(b_1)}$. Here X and Y without subscript are arithmetic means.

We can then rearrange to obtain $\sum \frac{k_i Var(Y_i)}{n} – \frac{X\sigma^2}{S_{xx}}$ which quickly yields the desired result. My question is, why does this work? This single term does not seem like it should alone yield the RHS. Have I made an error in algebra?

Best Answer

$(1)\ E(\hat{Y_h}) = E(b_0 + b_1X_h) = \beta_0 +\beta_1X_h$

$(2)\ var(\hat{Y_h}) = var(b_0 + b_1X_h)$

An alternate (but equivalent) version of the regression model can be written as:

$Y_i = \beta_0X_0 + \beta_1X_1 + \epsilon_i$

This model associates an X variable with each coefficient $(where X_0 = 1)$

Al alternate modification is to use the deviation $X_i -\bar{X}$ rather than $X_i$

So the model can be written as:

$Y_i = \beta_0^* + \beta_1(X_i - \bar{X}) + \epsilon_i$

where $(3)\ \beta_0^* = \beta_0 + \beta_1\bar{X}$

These models can be used interchangably.

We know from the normal equations:

$\Sigma Y_i = nb_0 + b_1\Sigma X_i$

solving for $b_0$

$(4)\ b_0 = \bar{Y} - b_1\bar{X}$

So substituting from (3) and (4):

$b_0^* = b_0 + b_1\bar{X} = (\bar{Y} - b_1\bar{X}) + b_1\bar{X} = \bar{Y}$

$(5)\ var(\hat{Y_h}) = var(b_0 + b_1X_h) = var(\bar{Y} + b_1(X_h - \bar{X}))$

using:

$var(\bar{Y}) = \frac{\sigma^2}{n}$

$var(aX) = a^2var(X)$

and

$var(X + Y) = Var(X) + Var(Y) + 2Cov(X,Y)$

So:

= $var(\bar{Y}) +(X_h - \bar{X})^2var(b_1) + 2(X_h-\bar{X})cov(\bar{Y},b_1)$

we use the fact that $Cov(\bar{Y},b_1) = 0$ due to the i.i.d assumption on $\epsilon_i$

$= \frac{\sigma^2}{n} + (X_h-\bar{X})^2\frac{\sigma^2}{\Sigma(X_i-\bar{X})^2}$

$= \sigma^2[\frac{1}{n} + \frac{(X_h - \bar{X})^2}{\Sigma(X_i - \bar{X})^2}]$