Regression – Proof that MSE is an Unbiased Estimator in Multiple Regression

multiple regressionregression

I am trying to prove that in multivariate linear regression $MSE = (n-2)\sigma^2 $

Here is my approach:

Under the usual notation,

$$ Y = X\beta + \epsilon \\
$$
$$ \hat Y = X\hat\beta \\
$$
$$ \hat\beta = (X'X)^{-1}X'Y \\ \\
\implies \hat\beta' = Y'X(X'X)^{-1}
$$

Now,
\begin{align}
\Sigma (Y_i – \hat Y_i)^2 & = (Y_i – \hat Y_i)'(Y_i – \hat Y_i) \\
& = (X(\beta – \hat \beta) + \epsilon)' (X(\beta – \hat \beta) + \epsilon)\\
& = \underbrace {(\beta – \hat \beta)'X'X(\beta – \hat \beta)}_{term1} + \underbrace {\epsilon'X (\beta – \hat \beta)}_{term2}\\
& + \underbrace {(\beta – \hat \beta)'X'\epsilon}_{term3} + \epsilon'\epsilon \\
\end{align}

Simplifying the individual terms

Term 1: \begin{align}
(\beta – \hat \beta)'X'X(\beta – \hat \beta) &= (\beta – (X'X)^{-1}X'Y)'X'X(\beta – (X'X)^{-1}X'Y)\\
& = (\beta' – Y'X(X'X)^{-1})X'X(\beta – (X'X)^{-1}X'Y) \\
& = \beta'X'X\beta – Y'X\beta – \beta'(X'X)(X'X)^{-1}X'Y + Y'X(X'X)^{-1}X'Y \\
& = \beta'X'X\beta – (\beta'X' + \epsilon')X\beta – \beta'(X'X)(X'X)^{-1}X'Y + \\ & (\beta'X' + \epsilon')X(X'X)^{-1}X'Y \quad \text{(substituting the value of }Y') \\
& = – \epsilon'X\beta + \epsilon'X(X'X)^{-1}X'Y \quad \text {some terms get cancelled} \\
& = – \epsilon'X\beta + \epsilon'X(X'X)^{-1}X'( X\beta + \epsilon) \quad \text {substituting the value of } Y \\
& = \epsilon'X(X'X)^{-1}X'\epsilon
\end{align}

Term 2 :
\begin{align}
\epsilon'X (\beta – \hat \beta) &= \epsilon'X(\beta – (X'X)^{-1}X'Y)\\
& = \epsilon'X(\beta – (X'X)^{-1}X'X\beta)\quad \text {substituting the value of } Y \\\\
& = 0
\end{align}

As Term 3 is transpose of Term 2, Term 3 = 0

\begin{align}
\Sigma (Y_i – \hat Y_i)^2 & = \epsilon'X(X'X)^{-1}X'\epsilon + \epsilon'\epsilon \\
E(\Sigma (Y_i – \hat Y_i)^2) & = E(\epsilon'X(X'X)^{-1}X'\epsilon + \epsilon'\epsilon) \\
\end{align}
I'm stuck here, unable to make any further simplifications. Can someone please help.

What further baffles me is the RHS term is greater than $n\sigma^2$ as $E(\epsilon'\epsilon) = n*\sigma$

Best Answer

Martijn Weterings's commnet is very useful. Your derivation of term 2 is wrong.

$\epsilon'X (\beta - \hat \beta) \\= \epsilon'X(\beta - (X'X)^{-1}X'Y) \\=\epsilon'X\left\{\beta - (X'X)^{-1}X'(X\beta+\epsilon)\right\}\\=\epsilon'X \left\{\beta-(X'X)^{-1}X'X\beta -(X'X)^{-1}X'\epsilon\right\}\\=-\epsilon'X(X'X)^{-1}X'\epsilon$

Now

$\Sigma (Y_i - \hat Y_i)^2\\=\epsilon'X(X'X)^{-1}X'\epsilon-\epsilon'X(X'X)^{-1}X'\epsilon-\epsilon'X(X'X)^{-1}X'\epsilon+\epsilon'\epsilon\\=\epsilon'\epsilon-\epsilon'X(X'X)^{-1}X'\epsilon\\=\epsilon'\epsilon-\epsilon'P\epsilon$

$P$ is the projection matrix which is symmetric and idempotent

Now calculate the expectation.

$E[\Sigma (Y_i - \hat Y_i)^2]\\=E(\epsilon'\epsilon-\epsilon'P\epsilon)\\=E(\epsilon'\epsilon)-E(\epsilon'P\epsilon)\\=n\sigma^2-\sigma^2trace(P) \\\text{(Suppose tarace(P)}=k)$

$=(n-k)\sigma^2$

$\therefore \frac{\Sigma (Y_i - \hat Y_i)^2}{n-k}$ is the unbiased estimator of $\sigma^2$, $k$ is the number of parameters you want to estimate,such as you want to estimate $\beta_0$ for intercept and $\beta_1$ for one predictor, the $k$ will be equal to 2.

Related Question