Prove Negative Log Likelihood for Gaussian Distribution is Convex – Optimization Guide

convex optimizationmaximum likelihoodnormal distribution

I am looking to compute maximum likelihood estimators for $\mu$ and $\sigma^2$, given n i.i.d random variables drawn from a Gaussian distribution. I believe I know how to write the expressions for negative log likelihood (kindly see below), however before I take derivatives with respect to $\mu$ and $\sigma^2,$ I want to prove that the neg. log likelihood is a convex function in $\mu$ and $\sigma^2$.

This is where I'm stuck – I'm unable to prove that the Hessian is Positive Semidefinite.

The negative log-likelihood function,
$$ l(\mu, \sigma^2) = \frac{n}{2}ln(2\pi) + \frac{n}{2}ln(\sigma^2) + \sum_{i=1}^n \frac{(xi – \mu)^2}{2\sigma^2}$$
Let $\alpha = \frac{1}{\sigma^2}$ (The book Convex Optimization by Boyd & Vandenberghe notes in Section 7.1 that this transformation should make the neg. log-likelihood convex in $\alpha$). We now get,
$$ l(\mu, \alpha) = \frac{n}{2}ln(2\pi) – \frac{n}{2}ln(\alpha) + \sum_{i=1}^n \frac{(x_i – \mu)^2\alpha}{2}$$
$$ = \frac{n}{2}ln(2\pi) + \frac{1}{2}\sum_{i=1}^n\left(-ln(\alpha) + \frac{(x_i – \mu)^2\alpha}{2}\right)$$

Define,
$$g_i(\mu, \alpha) = -ln(\alpha) + \frac{(x_i – \mu)^2\alpha}{2} $$

Now my approach is to show that $g_i(\mu, \alpha)$ is convex in $\mu$, $\alpha$ and use that to say that $l(\mu, \alpha)$ being a sum of convex $g_i$'s is also convex in $\mu$, $\alpha$. The Hessian for $g_i$ is:

$$ \nabla^2g_i =
\begin{pmatrix}
2\alpha & -2(x_i – \mu)\\
-2(x_i – \mu) & \frac{1}{\alpha^2} \\
\end{pmatrix}
$$

And the determinant of the Hessian is,
$$ \lvert \nabla^2g_i \rvert = \frac{2}{\alpha} – 4(x_i – \mu)^2$$
This is where I'm stuck – I cannot show that this determinant is non-negative for all values of $\mu$ and $\alpha (>0)$. Kindly help figure out my conceptual or other errors.

Kindly note I've consulted the following similar queries:
How to prove the global maximum log likelihood function of a normal distribution is concave

and Proving MLE for normal distribution

However both of them only show that the Hessian is non-negative at a point where $\mu$ and $\alpha$ equal their estimated values. The mistake I see is that the estimates were arrived in the first place by assuming the neg. log-likelihood is convex (i.e. by equating gradient to 0, which is the optimality criterion for a convex function).

Thanks

Best Answer

So you get $$l(\mu,\alpha) =\frac{n}{2}\ln 2 \pi - \frac{n}{2} \ln \alpha+ \sum \frac{(x_i- \mu)^2\alpha}{2}$$ Convex in $\mu$

The second derivative w.r.t $\mu$ is $$\frac{\partial^2}{\partial \mu^2}l = n \alpha > 0$$ So we get convexity in $\mu$.

Convex in $\alpha$

The second derivative w.r.t $\alpha^2$ is $$\frac{\partial^2}{\partial \alpha^2}l = \frac{1}{\alpha^2} > 0$$ So we get convexity in $\alpha$.

What I think you meant is that you would want to prove that $l(\pmb{z})$ is convex in $\pmb{z}$, where $\pmb{z} = [\mu, \alpha]$ (jointly). Well, it is not convex in $\pmb{z}$ because the Hessian you wrote has negative values for values of $x_i,\mu,\alpha$: Choose a small $\frac{2}{\alpha}$ and a large $4(x_i - \mu)^2$, this leaves us with a negative determinant. Boyd does not tell you that $l(\mu,\alpha)$ is convex in $\mu,\alpha$. The statement convex in mean and variance means that it is convex in mean and it is convex in variance.

The link you shared here is something completely different. They want to show that the optimal values are concave (at least this is what they state).