[Math] How is the sample variance an unbiased estimator for population variance

parameter estimationstatistics

I know some variation of this question has been asked repeatedly, but I don't think any of them answer my particular question.

I understand the definition of a maximum likelihood estimator and the proof that $\displaystyle \hat{\theta} = \frac{1}{n}\sum_{i=1}^{n}(X_{i}-\overline{X})^{2}$ is the MLE of $\sigma^{2}$. I also understand that $E(\hat{\theta}) = \frac{(n-1)}{n}\sigma^{2}$ and that therefore $\displaystyle S^{2} = \frac{1}{n-1}\sum_{i=1}^{n}(X_{i}-\overline{X})^{2}$ is unbiased.

What I don't understand is how we know that $S^{2}$ is still an estimator for $\sigma^{2}$. The calculation for the MLE resulted in $\hat{\theta}$, so wouldn't this still be the best guess for $\sigma^{2}$, even if that guess is biased? In what sense is $S^{2}$ better, if it doesn't maximize the likelihood of the population parameter?

Best Answer

First ask yourself, what does it mean for a statistic to be an estimator? Do all estimators have to be "good" ones?

Next, the MLE is "best" in the sense that such a choice maximizes the likelihood function for the observed sample, but that doesn't necessarily mean it is the only suitable choice for an estimator. It is in some sense the most likely choice for the parameter given the data we observed, but from the point of view of biasedness, it tends to underestimate the true variance. That is to say, the MLE for $\sigma^2$ will, on average, give an estimate that is too small for a fixed sample size, whereas $s^2$ does not have this problem, especially when the sample size is small.

We can also think of the quality of an estimator as being judged by other desirable properties; e.g., consistency, asymptotic unbiasedness, minimum mean squared error, or UMVUE. Maximum likelihood is just one possible criterion.