Covariance of Residuals and Fitted Values in Linear Regression

covarianceexpected valuelinear regressionregressionregression analysis

Consider the simple linear regression model

$Y_i = \beta_0 + \beta_1x_i + \epsilon_i$

where $\epsilon_i \sim^{indep} N(0, \sigma^2)$ for $i = 1,…,n$. Let $\hat{\beta_{0}}$ and $\hat{\beta_{1}}$ be the usual maximum likelihood estimators of $\beta_0$ and $\beta_1$, respectively. The $i$th residual is defined as $\hat{\epsilon_{i}} = Y_i – \hat{Y_{i}}$, where $\hat{Y_i} = \hat{\beta_{0}} + \hat{\beta_{1}}x_i$ is the $i$th fitted value.

Derive $Cov(\hat{\epsilon_{i}},\hat{Y_i})$

This is what I've got so far, I keep getting stuck though even trying to do this different ways

$$
\begin{align}
Cov(\hat{\epsilon_{i}},\hat{Y_i})
&= E[\hat{\epsilon_{i}}\hat{Y_i}] – E[\hat{\epsilon_{i}}]E[\hat{Y_i}]\\
&= E[\hat{\epsilon_{i}}\hat{Y_i}]
\text{ (as }E[\hat{\epsilon_{i}}]=0)\\
&= E[\hat{\epsilon_{i}}(Y_i – \hat{\epsilon_{i}})]\\
&= Y_iE[\hat{\epsilon_{i}}] – E[\hat{\epsilon_{i}}^2]\\
&= – E[\hat{\epsilon_{i}}^2]
\end{align}
$$

But I don't know how to proceed further with this as the residuals are not independent. Proceeding similarly from the second line but rearranging differently, I also got

$$
Cov(\hat{\epsilon_{i}},\hat{Y_i}) = Y_i^2 – E[\hat{Y_{i}}^2] = – E[\hat{\epsilon_{i}}^2]
$$

Which I'm having the same problem with, I feel like the answer is supposed to be zero but I'm just missing some piece of info to prove it.


Edit: I've been retrying this question and I think this may be a better method, although I'm stuck at a point with this too:
$$
\begin{align}
Cov(\hat{\epsilon_{i}},\hat{Y_i})
&= Cov(Y_i – \hat{Y_i}, \hat{Y_i})\\
&= Cov(Y_i ,\hat{Y_i}) – Cov(\hat{Y_i},\hat{Y_i})\\
&= Cov(Y_i ,\hat{Y_i}) – Var(\hat{Y_i})
\end{align}
$$

So, I know what $Var(\hat{Y_i})$ is, but I do not know what $Cov(Y_i ,\hat{Y_i})$ is, although I suspect it is $Var(\hat{Y_i})$. If someone could help me with a derivation for this, that would be amazing.

Best Answer

\begin{align} cov(Y_i, \hat Y_i) &= cov(Y_i, \hat \beta_0 + \hat \beta_1 X_i ) \\ & = cov(Y_i, \hat \beta_0 + \hat \beta_1 X_i )\\ & = cov(Y_i, \bar Y - \hat \beta_1 \bar X + \hat \beta_1 X_i )\\ & = cov(Y_i, \bar Y - \hat \beta_1 (X_i - \bar X) )\\ & = \frac{1}{n}Var(Y_i) + cov(Y_i , \frac{(X_i - \bar X) \sum (X_i - \bar X ) Y_i }{\sum ( X_i - \bar X ) ^ 2})\\ & = \frac{1}{n}\sigma ^ 2 + cov(Y_i , \frac{(X_i - \bar X) ^ 2 Y_i }{\sum ( X_i - \bar X ) ^ 2})\\ & = \frac{1}{n}\sigma ^ 2 + \frac{(X_i - \bar X) ^ 2 Var(Y_i) }{\sum ( X_i - \bar X ) ^ 2}\\ & = \frac{1}{n}\sigma ^ 2 + \frac{\sigma ^ 2 (X_i - \bar X) ^ 2 }{\sum ( X_i - \bar X ) ^ 2}\\ \end{align}

which is the same as $Var(\hat Y_i)$.

Related Question