[Math] Proof that the sample mean is the “best estimator” for the population mean.

parameter estimationstatistical-inferencestatistics

I've always heard that the sample mean $\overline{X}$ is "the best estimator" for the population mean $\mu$. But is that always true regardless of the population distribution? is there any proof for that?
For example let's suppose for an unknown population, we have three samples, say $X_1$, $X_2$, $X_3$. Based on what I've heard (Not necessarily true) the estimator defined as the following:
$$\frac{1}{3}(X_1+X_2+X_3)$$
is always preferable to, for instance:
$$\frac{1}{6}(X_1+X_3)+\frac{2}{3}X_2$$
or
$$\max(X_1, X_2, X_3)$$
But in what sense is it better? and why?

Best Answer

It is not true that sample mean is the 'best' choice of estimator of the population mean for any underlying parent distribution. The only thing true regardless of the population distribution is that the sample mean is an unbiased estimator of the population mean, i.e. $E(\overline X)=\mu$.

Now unbiasedness is often not the only criteria considered for choosing an estimator of your unknown quantity of interest. We usually prefer estimators that have smaller variance or smaller mean squared error (MSE) in general, because it is a desirable property to have in an estimator. And it might be the case that $\overline X$ does not attain the minimum variance/MSE among all possible estimators.

Consider a sample $(X_1,X_2,\ldots,X_n)$ drawn from a uniform distribution on $(0,\theta)$. Now $T_1=\overline X$ is an unbiased estimator of the population mean $\theta/2$, but it does not attain the minimum variance among all unbiased estimators of $\theta/2$. It can be shown that the uniformly minimum variance unbiased estimator (UMVUE) of the population mean is instead $T_2=\frac{n+1}{2n}\max(X_1,\ldots,X_n)$. So $T_2$ is the best estimator within the unbiased class where 'best' means 'having the smallest variance'.

Related Question