Regression – How to Show $ (\hat{\beta_1}-\beta_1) $ and $ \bar{u} $ Are Uncorrelated in Econometrics

econometricsregressionself-study

I asked this question and thought I found a solution to it by working through the steps in my textbook. Unfortunately the comments told me that I was completely wrong and only got the proof to work by chance, so now I'm really lost and frustrated.

To fix notation, the model is
$$
y_i = \beta_0 + \beta_1 x_i + u_i,
$$
where $u_i$ is the error term.

Earlier I showed that
$\hat{\beta}_1 = \beta_1 + \sum_{i=1}^n w_i u_i$, where
$w_i = \frac{x_i – \bar{x}}{SST_x}$, and $SST_x = \sum_{i=1}^n (x_i – \bar{x})^2$.

Here is what I did so far:
\begin{align}
E[(\hat{\beta_1}-\beta_1) \bar{u}] &= E[\bar{u}\displaystyle\sum\limits_{i=1}^n w_i u_i] \\
&=\displaystyle\sum\limits_{i=1}^n E[w_i \bar{u} u_i] \\
&=\displaystyle\sum\limits_{i=1}^n w_i E[\bar{u} u_i] \\
&= \displaystyle\sum\limits_{i=1}^n w_i \left(Cov(\bar{u}, u_i) + E(\bar{u})E(u_i)\right) \\
&= \displaystyle\sum\limits_{i=1}^n w_i Cov(\bar{u}, u_i) \\
&= \displaystyle\sum\limits_{i=1}^n w_i Cov\left(\frac{1}{n} \displaystyle\sum\limits_{i=1}^n u_i, u_i\right) \\
&= \frac{1}{n}\displaystyle\sum\limits_{i=1}^n w_i Cov\left(\displaystyle\sum\limits_{i=1}^n u_i, u_i\right) \\
&= \frac{1}{n}\displaystyle\sum\limits_{i=1}^n w_i E\left(\left[u_i – E(u_i)\right]\left[ \displaystyle\sum\limits_{i=1}^n u_i – E\left(\displaystyle\sum\limits_{i=1}^n u_i\right)\right]\right) \\
&= \frac{1}{n}\displaystyle\sum\limits_{i=1}^n w_i E\left(\left[u_i – E(u_i)\right]\left[ \displaystyle\sum\limits_{i=1}^n u_i – \displaystyle\sum\limits_{i=1}^n E\left(u_i\right)\right]\right) \\
&= \frac{1}{n}\displaystyle\sum\limits_{i=1}^n w_i E\left(\left[u_i – 0\right]\left[ \displaystyle\sum\limits_{i=1}^n u_i – 0\right]\right) \\
&= \frac{1}{n}\displaystyle\sum\limits_{i=1}^n w_i E\left(u_i\displaystyle\sum\limits_{i=1}^n u_i\right) \\
&= \frac{1}{n}\displaystyle\sum\limits_{i=1}^n w_i \left[E\left(u_i u_1 +\cdots + u_i u_n \right)\right] \\
&= \frac{1}{n}\displaystyle\sum\limits_{i=1}^n w_i \left[E\left(u_i u_1\right) +\cdots + E\left(u_i u_n \right)\right] \\
\end{align}

At this point, I want to say that because we assume the errors for each observation are i.i.d:

\begin{align}
&= \frac{1}{n}\displaystyle\sum\limits_{i=1}^n w_i \left[E(u_i) E(u_1) +\cdots + E(u_i) E(u_n)\right] \\
\end{align}

But if this is the case then since $E(u_i) = 0, \forall i$, everything just cancels and I don't need to use the definition of $w_i$, which one of the comments mentioned I needed to use last time.

This is in chapter 2 of my book and the book says that "it's obvious that…" so maybe it is obvious. I keep asking for hints but I feel like I don't understand them, but I want to keep trying.

Is this correct? If not, does anyone have a hint on this part of the problem, assuming I'm on the right path? I can probably make the steps less verbose, but I want to make sure it's right first. The last time I worked this out, I reached the right conclusion but the proof was all wrong, so I'm assuming I only have it right by chance again.

Best Answer

Strategy

It can be illuminating to move back and forth among three points of view: the statistical one (viewing $x_i$ and $y_i$ as data), a geometrical one (where least squares solutions are just projections in suitable Euclidean spaces), and an algebraic one (manipulating symbols representing matrices or linear transformations). Doing this not only streamlines the ideas, but also exposes the assumptions needed to make the result true, which otherwise might be buried in all the summations.

Notation and assumptions

So: let $y = X\beta + u$ where $y$ is an $n$-vector, $X$ is an $n$ by $p+1$ "design matrix" whose first column is all ones, $\beta$ is a $p+1$-vector of true coefficients, and $u$ are iid random variables with zero expectation and common variance $\sigma^2$. (Let's continue, as in the question, to begin the coefficient vector's indexes with $0$, writing ${\beta} = ({\beta_0}, {\beta_1}, \ldots, {\beta_p})$.) This generalizes the question, for which $X$ has just two columns: its second column is the vector $(x_1, x_2, \ldots, x_n)'$.

Basic properties of OLS regression

The regression estimate $\hat{\beta}$ is a $p+1$-vector obtained by applying a linear transformation $\mathbb{P}$ to $y$. As the solution to the regression, it projects the exact values $X\beta$ onto the true values $\beta$: $$\mathbb{P}\left(X\beta\right) = \beta.$$ Finally--and this is the crux of the matter before us--it is obvious statistically (thinking of these values as data)--that the projection of $y = (1, 1, \ldots, 1)'$ has the unique solution $\hat{\beta} = (1, 0, 0, \ldots, 0)$: $$\mathbb{P}1_n' = (1, 0, 0, \ldots, 0),$$ because when all the responses $y_i$ are equal to $1$, the intercept $\beta_0 = 1$ and all the other coefficients must vanish. That's all we need to know. (Having a formula for $\mathbb{P}$ in terms of $X$ is unimportant (and distracting).)

Easy preliminaries

Begin with some straightforward algebraic manipulation of the original expression:

$$\eqalign{ (\hat{\beta}-\beta)\bar{u} &= (\mathbb{P}y-\beta)\bar{u} \\ &= (\mathbb{P}(X\beta+u)-\beta)\bar{u} \\ &= \mathbb{P}(X\beta+u)\bar{u} - \beta\bar{u} \\ &= (\mathbb{P}X\beta)\bar{u} + \mathbb{P}u\bar{u} - \beta\bar{u} \\ &= \beta\bar{u} + \mathbb{P}u\bar{u} - \beta\bar{u}\\ &= \mathbb{P}(u\bar{u}). } $$

This almost mindless sequence of steps--each leads naturally to the next by simple algebraic rules--is motivated by the desire to (a) express the random variation purely in terms of $u$, whence it all derives, and (b) introduce $\mathbb{P}$ so that we can exploit its properties.

Computing the expectation

Taking the expectation can no longer be put off, but because $\mathbb{P}$ is a linear operator, it will be applied to the expectation of $u\bar{u}$. We could employ some formal matrix operations to work out this expectation, but there's an easier way. Recalling that the $u_i$ are iid, it is immediate that all the coefficients in $\mathbb{E}[u\bar{u}]$ must be the same. Since they are the same, each one equals their average. This can be obtained by averaging the coefficients of $u$, multiplying by $\bar{u}$, and taking the expectation. But that's just a recipe for finding $$\mathbb{E}[\bar{u}\bar{u}] = \text{Var}[\bar{u}] = \sigma^2/n.$$ It follows that $\mathbb{E}[u\bar{u}]$ is a vector of $n$ values, all of which equal $\sigma^2/n$. Using our previous vector shorthand we may write $$\mathbb{E}[(\hat{\beta}-\beta)\bar{u} ] = \mathbb{E}[\mathbb{P}(u\bar{u})] = \mathbb{PE}[u\bar{u}]=\mathbb{P}1_n'\sigma^2/n=(\sigma^2/n, 0, 0, \ldots, 0).$$

Conclusion

This says that the estimated coefficients $\hat{\beta_i}$ and the mean error $\hat{u}$ are uncorrelated for $i=1, 2, \ldots, p$, but not so for $\hat{\beta_0}$ (the intercept).


It is instructive to review the steps and consider which assumptions were essential and which elements of the regression apparatus simply did not appear in the demonstration. We should expect any proof, no matter how elementary or sophisticated, will need to use the same (or stronger) assumptions and will need, in some guise or another, to include calculations of $\mathbb{P}1_n'$ and $\mathbb{E}[u\bar{u}]$.