optimization – When is the Likelihood Function Positive Semidefinite?

fisher informationlikelihoodoptimization

This may be a very misinformed question, but I cant figure out why its not true. Here goes:

According to Wikipedia and this post, the hessian of a likelihood function equals the information matrix, or the covariance matrix of the score functions, ie:

$$I(\theta)_{i,j} = \mathrm{E}_\theta \left[ \left(\partial_i \log f_{X\mid\Theta}(X\mid\theta)\right) \left(\partial_j \log f_{X\mid\Theta}(X\mid\theta)\right)\right] \, , = -E\left[\frac{\partial^{2} \log(f(X|\theta))}{\partial \theta_{i} \partial \theta_{j}}\bigg|\theta\right]$$

If this is true, wouldnt these conclusions hold true:

  1. The Hessian of the likelihood functions is always positive semidefinite (PSD)

  2. The likelihood function is thus always convex (since the 2nd derivative is PSD)

  3. The likelihood function will have no local minima, only global minima!!!

These results seem too good to be true, but I cant seem to understand why they are false.

Thanks!

Best Answer

The Fisher Information is defined as

$${\left(\mathcal{I} \left(\theta \right) \right)}_{i, j} = \operatorname{E} \left[\left. \left(\frac{\partial}{\partial\theta_i} \log f(X;\theta)\right) \left(\frac{\partial}{\partial\theta_j} \log f(X;\theta)\right) \right|\theta\right]$$

(the question in the post you linked to states mistakenly otherwise, and the answer politely corrects it).

Under the following regularity conditions:
1) The support of the random variable involved does not depend on the unknown parameter vector
2) The derivatives of the loglikelihood w.r.t the parameters exist up to 3d order
3) The expected value of the squared 1st derivative is finite

and under the assumption that the specification is correct (i.e. the specified distribution family includes the actual distribution that the random variable follows)
then the Fisher Information equals the (negative of the) inverted Hessian of the loglikelihood for one observation. This equality is called the "Information Matrix Equality" for obvious reasons.

While the three regularity conditions are relatively "mild" (or at least can be checked), the assumption of correct specification is at the heart of the issues of statistical inference, especially with observational data. It simply is too strong a condition to be accepted easily. And this is the reason why it is a major issue to prove that the log-likelihood is concave in the parameters (which leads in many cases to consistency and asymptotic normality irrespective of whether the specification is correct -the quasi-MLE case), and not just assume it by assuming that the Information Matrix Equality holds.

So you were absolutely right in thinking "too good to be true".

On the side, you neglected the presence of the minus sign -so the Hessian of the log-likelihood (for one observation) would be negative-semidefinite, as it should since we seek to maximize it, not minimize it.