I am trying to prove that in multivariate linear regression $MSE = (n-2)\sigma^2 $
Here is my approach:
Under the usual notation,
$$ Y = X\beta + \epsilon \\
$$
$$ \hat Y = X\hat\beta \\
$$
$$ \hat\beta = (X'X)^{-1}X'Y \\ \\
\implies \hat\beta' = Y'X(X'X)^{-1}
$$
Now,
\begin{align}
\Sigma (Y_i – \hat Y_i)^2 & = (Y_i – \hat Y_i)'(Y_i – \hat Y_i) \\
& = (X(\beta – \hat \beta) + \epsilon)' (X(\beta – \hat \beta) + \epsilon)\\
& = \underbrace {(\beta – \hat \beta)'X'X(\beta – \hat \beta)}_{term1} + \underbrace {\epsilon'X (\beta – \hat \beta)}_{term2}\\
& + \underbrace {(\beta – \hat \beta)'X'\epsilon}_{term3} + \epsilon'\epsilon \\
\end{align}
Simplifying the individual terms
Term 1: \begin{align}
(\beta – \hat \beta)'X'X(\beta – \hat \beta) &= (\beta – (X'X)^{-1}X'Y)'X'X(\beta – (X'X)^{-1}X'Y)\\
& = (\beta' – Y'X(X'X)^{-1})X'X(\beta – (X'X)^{-1}X'Y) \\
& = \beta'X'X\beta – Y'X\beta – \beta'(X'X)(X'X)^{-1}X'Y + Y'X(X'X)^{-1}X'Y \\
& = \beta'X'X\beta – (\beta'X' + \epsilon')X\beta – \beta'(X'X)(X'X)^{-1}X'Y + \\ & (\beta'X' + \epsilon')X(X'X)^{-1}X'Y \quad \text{(substituting the value of }Y') \\
& = – \epsilon'X\beta + \epsilon'X(X'X)^{-1}X'Y \quad \text {some terms get cancelled} \\
& = – \epsilon'X\beta + \epsilon'X(X'X)^{-1}X'( X\beta + \epsilon) \quad \text {substituting the value of } Y \\
& = \epsilon'X(X'X)^{-1}X'\epsilon
\end{align}
Term 2 :
\begin{align}
\epsilon'X (\beta – \hat \beta) &= \epsilon'X(\beta – (X'X)^{-1}X'Y)\\
& = \epsilon'X(\beta – (X'X)^{-1}X'X\beta)\quad \text {substituting the value of } Y \\\\
& = 0
\end{align}
As Term 3 is transpose of Term 2, Term 3 = 0
\begin{align}
\Sigma (Y_i – \hat Y_i)^2 & = \epsilon'X(X'X)^{-1}X'\epsilon + \epsilon'\epsilon \\
E(\Sigma (Y_i – \hat Y_i)^2) & = E(\epsilon'X(X'X)^{-1}X'\epsilon + \epsilon'\epsilon) \\
\end{align}
I'm stuck here, unable to make any further simplifications. Can someone please help.
What further baffles me is the RHS term is greater than $n\sigma^2$ as $E(\epsilon'\epsilon) = n*\sigma$
Best Answer
Martijn Weterings's commnet is very useful. Your derivation of term 2 is wrong.
$\epsilon'X (\beta - \hat \beta) \\= \epsilon'X(\beta - (X'X)^{-1}X'Y) \\=\epsilon'X\left\{\beta - (X'X)^{-1}X'(X\beta+\epsilon)\right\}\\=\epsilon'X \left\{\beta-(X'X)^{-1}X'X\beta -(X'X)^{-1}X'\epsilon\right\}\\=-\epsilon'X(X'X)^{-1}X'\epsilon$
Now
$\Sigma (Y_i - \hat Y_i)^2\\=\epsilon'X(X'X)^{-1}X'\epsilon-\epsilon'X(X'X)^{-1}X'\epsilon-\epsilon'X(X'X)^{-1}X'\epsilon+\epsilon'\epsilon\\=\epsilon'\epsilon-\epsilon'X(X'X)^{-1}X'\epsilon\\=\epsilon'\epsilon-\epsilon'P\epsilon$
$P$ is the projection matrix which is symmetric and idempotent
Now calculate the expectation.
$E[\Sigma (Y_i - \hat Y_i)^2]\\=E(\epsilon'\epsilon-\epsilon'P\epsilon)\\=E(\epsilon'\epsilon)-E(\epsilon'P\epsilon)\\=n\sigma^2-\sigma^2trace(P) \\\text{(Suppose tarace(P)}=k)$
$=(n-k)\sigma^2$
$\therefore \frac{\Sigma (Y_i - \hat Y_i)^2}{n-k}$ is the unbiased estimator of $\sigma^2$, $k$ is the number of parameters you want to estimate,such as you want to estimate $\beta_0$ for intercept and $\beta_1$ for one predictor, the $k$ will be equal to 2.