Solved – Variance of residuals vs. MLE of the variance of the error term

least squaresmaximum likelihoodresiduals

An important basic theorem of linear regression is that the maximum likelihood estimates (MLEs) of the coefficients coincide with their least-squares estimates.

What about the variance of the error term? Does its MLE coincide with the variance of the residuals, that is,

$$ \frac{∑_i (Y_i – \hat Y_i)^2}{n} $$

where the $Y_i$s are the observed values of the dependent variable, the $\hat Y_i$s are the fitted values, and $n$ is the sample size?

Best Answer

If $Y \sim \mathcal N(X\beta, \sigma^2 I)$ then the log likelihood is $$ l(\beta, \sigma^2|y) = -\frac n2 \log (2\pi) - \frac n2 \log(\sigma^2) - \frac 1{2\sigma^2}||y-X\beta||^2 $$ and assuming non-stochastic and full rank predictors. From this we find that $$ \frac{\partial l}{\partial \sigma^2} = 0 \implies \hat \sigma^2 = \frac 1n ||Y-X\hat \beta||^2. $$

We want to know when the MLE $\hat \sigma^2$ is equal to the sample variance of the residuals $$ \tilde \sigma^2 = \frac{1}{n}\sum_{i=1}^n (e_i - \bar e)^2 $$

where $e = Y - \hat Y$ are the residuals. We know $$ n\tilde \sigma^2 = e^Te - n\bar e^2 $$ while $$ n\hat \sigma^2 = e^Te $$ so this tells us the two are equal when the constant vector $\mathbf 1$ is in the column space of $X$, which means $\bar e = 0$. If that is not the case then the two won't be exactly equal.


I'm leaving the rest of my answer here but as I understand OP's question better I don't think it applies.

Note $$ ||Y - X\hat \beta||^2 = (Y - HY)^T(Y - HY) = Y^T(I-H)Y $$ where $H = X(X^TX)^{-1}X^T$. This means that we have a Gaussian quadratic form, so $$ Var\left(Y^T (I-H)Y\right) = 2\sigma^4 \text{tr}(I-H) + 4\sigma^2 \beta^T X^T(I-H)X\beta. $$ $X^T(I-H)X = X^TX - X^TX(X^TX)^{-1}X^TX = 0$ and $\text{tr}(I-H) = n-p$ so we have $$ Var(\hat \sigma^2) = \frac{2\sigma^4(n-p)}{n^2}. $$

The standard estimate of $\sigma^2$ is probably $\tilde \sigma^2 := \frac{1}{n-p}||Y - X\hat \beta||^2$ (which is unbiased, as we can see by computing $E\left(Y^T(I-H)Y\right)$) so $$ Var(\tilde \sigma^2) = \frac{2\sigma^4}{n-p}. $$

I'm not entirely sure what more than this you're looking for, as technically what you asked for was the variance of the residuals which is $$ Var(e) = Var\left((I-H)Y\right) =\sigma^2 (I-H) $$ but I don't think that's what you mean. Or if that is what you mean, then we can directly compare this to $Var(\varepsilon) = \sigma^2 I$ and the difference comes down to $\sigma^2 H$.

Related Question