Why has the MLE for a Gaussian Distribution only one solution although not being “jointly” convex in mean and variance

convex optimizationconvex-analysismaximum likelihoodnormal distribution

I am currently looking into the Maximum Likelihood Estimate (MLE) for the mean $\mu$ and $\sigma^2$ of a Gaussian distribution $\mathcal{N}(\mu, \sigma^2)$ for a given set of samples

$$\left\{x_i \ \vert\ x_i \in \mathbb{R}, i = 1, \ldots, n \right\}.$$

The MLE is given by

\begin{align*}
\mu^* &= \frac{1}{n} \sum_{i=1}^n x_i \\
\sigma^{2^*} &= \frac{1}{n} \sum_{i=1}^n (x_i – \mu^*)^2
\end{align*}

and is indeed the global maximizer, as shown in this thread. It is however easy to show that jointly maximizing the likelihood w.r.t. $\mu$ and $\sigma^2$ is non-concave, as done in this thread. More precisely, this thread looks at the equal objective of minimizing the negative log-likelihood and it shows that the negative log-likelihood is non-convex.

I have already shown in the very same thread, that we can find a non-convex set $G$ on which the negative log-likelihood is convex and that contains the MLE.

Now my question: Why are there no other minimizers of the negative log-likelihood outside of the set $G$ that we can find via the condition

$$\nabla_{\mu,\sigma^2} – \log\left(\prod_{i=1}^n \mathcal{N}(x_i \vert \mu, \sigma^2)\right) = 0\quad?$$

More generally, is it possible that a function is non-convex on $\mathbb{R}^n$ but has only one minimizer (or a set of minimizers with the same function value)?

Best Answer

It is key that the (location of the) maximum over $\mu$ does not depend on $\sigma$. For that optimal value of $\mu$, there is again a unique optimal $\sigma$. Therefore, you can just optimize sequentially and you do not need joint concavity.