[Math] MSEs of Estimators of Variance in Normal Distribution

mean square errornormal distributionprobabilitystatistics

$\newcommand{\MSE}{\operatorname{MSE}}$Consider the mean squared error (MSE) of the following estimators of variance, where $X_i$ is given by the normal distribution:

$$\MSE(S^2)=\MSE(\frac{1}{n-1}\sum_{i=1}^n (X_i – \bar{X})^2) = \frac{2}{n-1} \sigma^4$$

$$\MSE(S_1^2)=\MSE(\frac{1}{n}\sum_{i=1}^{n}(X_i – \bar{X})^2) = \frac{2n-1}{n^2}\sigma^4$$

And, in the case where $\mu$ is known:

$$\MSE(S_0^2)=\MSE(\frac{1} n \sum_{i=1}^n (X_i – \mu)^2) = \frac{2}{n}\sigma^4$$

In considering the MSE of these three estimators, I have two questions:

1) Why do we often use $S^2$ instead of $S_1^2$ as an estimator for $\sigma$, even though the latter has a lower MSE? If it is due to the fact that $S^2$ has a lower bias, then why do we define MSE the way we do at all? If minimizing bias is more important than minimizing variance, couldn't we set the MSE to be equal to twice the bias squared plus the variance, or something like that?

2) Intuitively, how is $\MSE(S_1^2) > \MSE(S_0^2)$? This doesn't really make sense to me, as in cases where we know $\mu$ our MSE should really only decrease. The only explanation I can think of is that if we were to have an entire sample that was biased, the deviations from the population mean would clearly be greater than the deviations from the sample mean. But, on average, this shouldn't be the case.

Thanks in advance!

Best Answer

$\newcommand{\MSE}{\operatorname{MSE}}$It's not unusual to use the maximum-likelihood estimator of variance, which is a biased estimator with a lower mean squared error than the best unbiased estimator. Nor is it a general rule that it is better to be unbiased that to have a small MSE. The fact that unbiasedness is in some instances a very bad thing was the point of this paper.

Now let's compare $S_1$ with $S_0.$ We have $$ \frac{nS_1}{\sigma^2} = \frac 1 {\sigma^2} \sum_{i=1}^n (X_i-\overline X)^2 \sim \chi_{n-1}^2, $$ a distribution with expectation $n-1$ and variance $2(n-1),$ and $$ \frac{nS_0}{\sigma^2} = \frac 1 {\sigma^2} \sum_{i=1}^n (X_i-\mu)^2 \sim \chi^2_n, $$ a distribution with expectation $n$ and variance $2n.$

Therefore \begin{align} \operatorname{E}(cS_1) & = \frac{c\sigma^2} n\cdot(n-1) \text{ and } \operatorname{var}(cS_1) = \frac{c^2 \sigma^4}{n^2} \cdot 2(n-1), \\[10pt] \text{so } \MSE(cS_1) & = \text{variance} + (\text{bias})^2 \\[10pt] & = \frac{c^2 \sigma^4}{n^2} \cdot 2(n-1) + \left( \frac{c\sigma^2} n\cdot(n-1) - \sigma^2 \right)^2. \\[10pt] \text{The value of $c$ that minimizes this is }c & = \frac n {n+1}, \\[10pt] \text{and then you have } \MSE(cS_1) & = \frac{2\sigma^4}{n+1}. \end{align} In the same way, you can find the value of $c$ that minimizes the MSE of $cS_0$ and then you get $$ \Big(\MSE(cS_0) \text{ (with a different value of $c$)} \Big) = \frac{2\sigma^4} {n+2}. $$ Thus, as your intuition suggests, you can do better knowing $\mu$ than not knowing $\mu.$ But in order to do that, you have to multiply in each case by the appropriate value of $c.$ That's what you didn't do.