Solved – Proof that $\hat{\sigma}^2$ is an unbiased estimator of $\sigma^2$ in simple linear regression

regressionself-studyunbiased-estimatorvariance

I know there's a similar post about this, but I believe my question is a bit different.

In my textbook the author rewrites

$-2(\hat{\beta}_1-\beta_1)\sum u_i (x_i-\bar{x})$

into

$-2(\hat{\beta}_1-\beta_1)^2\sum (x_i-\bar{x})^2$

He doesn't use any expectation or variance operator.

If you don't want to go through my whole calculation, then you can check my final result:

$-2(\hat{\beta}_1-\beta_1)((\hat{\beta}_1-\beta_1)\sum(x_i-\bar{x})^2 +\sum\hat{u}_i(x_i-\bar{x}))$

How can I get rid of the last term?

I started the following:

use

$u_i=y_i-\beta_0-\beta_1x_i$

and substitute into first equation:

$-2(\hat{\beta}_1-\beta_1)\sum (y_i-\beta_0-\beta_1x_i) (x_i-\bar{x})$

use

$y_i=\hat{y}_i+\hat{u}_i$ -> $y_i=\hat{\beta}_0+ \hat{\beta}_1 x_i+\hat{u}_i$

and substitute into the following equation

$-2(\hat{\beta}_1-\beta_1)\sum (y_i-\beta_0-\beta_1x_i) (x_i-\bar{x})$

->

$-2(\hat{\beta}_1-\beta_1)\sum ((\hat{\beta}_0+ \hat{\beta}_1 x_i+\hat{u}_i)-\beta_0-\beta_1x_i) (x_i-\bar{x})$

simplifying the equation above gives

$-2(\hat{\beta}_1-\beta_1)\sum (\hat{\beta}_0+ \hat{\beta}_1 x_i+\hat{u}_i-\beta_0-\beta_1x_i) (x_i-\bar{x})$

multiply with $(x_i-\bar{x})$

$-2(\hat{\beta}_1-\beta_1)\sum (\hat{\beta}_0(x_i-\bar{x})+ \hat{\beta}_1 x_i(x_i-\bar{x})+\hat{u}_i(x_i-\bar{x})-\beta_0(x_i-\bar{x})-\beta_1x_i(x_i-\bar{x}))$

set brackets before summation operator

$-2(\hat{\beta}_1-\beta_1)(\sum (\hat{\beta}_0(x_i-\bar{x})+ \hat{\beta}_1 x_i(x_i-\bar{x})+\hat{u}_i(x_i-\bar{x})-\beta_0(x_i-\bar{x})-\beta_1x_i(x_i-\bar{x})))$

simplify further

$-2(\hat{\beta}_1-\beta_1)((\hat{\beta}_0\sum(x_i-\bar{x})+ \hat{\beta}_1 \sum x_i(x_i-\bar{x})+\sum\hat{u}_i(x_i-\bar{x})-\beta_0\sum(x_i-\bar{x})-\beta_1\sum x_i(x_i-\bar{x})))$

$\sum(x_i-\bar{x})$ equals zero, therefore

$-2(\hat{\beta}_1-\beta_1)(\hat{\beta}_1 \sum x_i(x_i-\bar{x})+\sum\hat{u}_i(x_i-\bar{x})-\beta_1\sum x_i(x_i-\bar{x}))$

use $\sum x_i(x_i-\bar{x}) = \sum(x_i-\bar{x})^2$

$-2(\hat{\beta}_1-\beta_1)(\hat{\beta}_1 \sum(x_i-\bar{x})^2 +\sum\hat{u}_i(x_i-\bar{x})-\beta_1\sum(x_i-\bar{x})^2))$

factorize $\sum(x_i-\bar{x})^2$

$-2(\hat{\beta}_1-\beta_1)((\hat{\beta}_1-\beta_1)\sum(x_i-\bar{x})^2 +\sum\hat{u}_i(x_i-\bar{x}))$

How do I get rid now of the last term?

Best Answer

We want to show that $\sum_{i=1}^{n} (y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i) (x_i - \bar{x}) = 0$, which is the same as $\sum_{i=1}^{n} (\hat{\beta}_0 + \hat{\beta}_1 x_i ) (x_i - \bar{x}) = \sum_{i=1}^{n} y_i (x_i - \bar{x})$. This roughly says that the weighted average of fitted $y$ values equals the weighted average of actual $y$ values, using weights $x_i - \bar{x}$. For this we just need to do some algebra and remember the definitions $\hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}$ and $\hat{\beta}_1 = \sum_{i=1}^{n} (y_i - \bar{y}) (x_i - \bar{x}) / \sum_{i=1}^{n} (x_i - \bar{x})^2$. Let's start with the left hand side

\begin{align} \sum_{i=1}^{n} (\hat{\beta}_0 + \hat{\beta}_1 x_i ) (x_i - \bar{x}) &= \sum_{i=1}^{n} (\bar{y} - \hat{\beta}_1 \bar{x} + \hat{\beta}_1 x_i ) (x_i - \bar{x}) \\ &= \bar{y} \sum_{i=1}^{n} (x_i - \bar{x}) + \hat{\beta}_1 \sum_{i=1}^{n} (x_i - \bar{x})^2 . \end{align}

We know the first term is zero, and the sum of squares $\sum_{i=1}^{n} (x_i - \bar{x})^2$ cancels with the denominator of $\hat{\beta}_1$ leaving us with just $\sum_{i=1}^{n} (y_i - \bar{y}) (x_i - \bar{x}) = \sum_{i=1}^{n} y_i (x_i - \bar{x})$, which is what we wanted to show.