Solved – Prove that maximum likelihood estimators for Gaussian distribution are a global maximum

maximum likelihoodnormal distributionself-study

I'm studying on Casella-Berger, I'm at page 322 in which it explain how to find MLE for a Gaussian distribution with parameter $\mu$ and $\sigma^2$, both unknown.
It finds MLE, and up to this point it is all clear, and they are $\hat{\mu} = \bar{x}$ and $\hat{\sigma}^2 = \frac{1}{n} \sum_{i=1}^{n} (x_i-\bar{x})^2$.

Now, it says it is difficult to prove analitically that these are global maxima indeed, and it uses this fact:

If $\theta \ne \bar{x}$ then $\sum (x_i-\theta)^2 > \sum (x_i-\bar{x})^2$.

It doesn't give any explaination for that. Is there something obvious I don't see?

Best Answer

You can use a sum-of-squares argument to see this.

$$\sum_i (x_i-\theta)^2 = \sum_i (x_i - \bar{x}+\bar{x}-\theta)^2 = \sum_i (x_i-\bar{x})^2+n(\bar{x}-\theta)^2+\\\color{red}{2(\bar{x}-\theta)\sum_i(x_i-\bar{x})}$$

Now, $\bar{x}$ is defined so that the cross term becomes zero since $\sum_i x_i = \sum_i \bar{x}$.

We are therefore left with:

$$\sum_i (x_i - \theta)^2 = \sum_i (x_i-\bar{x})^2+n(\bar{x}-\theta)^2$$

This is the variance/bias decomposition of the squared-error associated with estimator $\theta$. Since both terms on the RHS are positive and only the bias term depends on $\theta$, we have:

$$\sum_i (x_i - \theta)^2 > \sum_i (x_i - \bar{x})^2$$