Does $E(\sum e_i^2) = \sum E(y_i^2) – E(\sum \hat y_i^2)$ hold true

proof-explanationregressionstatistics

This was posted as a practice proof for a regressions class. I've worked through it from the perspective of $SSE = SST – SSR$, but I cannot reduce to the given equation. There were other mistakes made in this practice homework, so it's possible this problem is missing something, but I don't want to pass off my inability to prove the equation on the practice problem's design.

We know that $E(\sum e_i^2) = E\sum (y_i – \hat y_i)^2$, but when factoring out the values, this side does not equal the right side of the equation. I thought to start from $SSE = SST – SSR$ and reduce that initial setup—$E\sum (y_i – \hat y_i)^2 = E\sum (y_i – \bar y)^2 – E\sum (\hat y_i – \bar y)^2$—but again I was unable to reduce the given values down to the initial equation given in the title. As I reduce it, there are leftover $2y_i \hat y_i$ and $2\hat y_i \bar y_i$, which I can't remove. I appreciate any insight into what I'm missing (or what the initial question is missing).

Best Answer

You are close. I presume that from $SSE = SST - SSR$ you have $$\sum_i e_i^2 = \sum_i (y_i - \bar{y})^2 - \sum_i (\hat{y}_i - \bar{y})^2 = \sum_i \hat{y}_i^2 - \sum_i y_i^2 - 2 \sum_i (y_i - \hat{y}_i) \bar{y},$$ and are wondering what to do with the extra term.

It turns out the last term is zero; you can prove this by looking at the "normal equations" (i.e. look at the derivation of the least squares coefficients) or by laboriously plugging in the definition of $\hat{y}_i$.

Note that the result holds even without the expectations.

[By the way, why do you have an index $i$ in $\bar{y}_i$? Isn't $\bar{y} := \frac{1}{n} \sum_i y_i$?]