Gaussian Mixture Model – Why Can’t MLE Be Implemented Directly?

gaussian mixture distributionlatent-variablemaximum likelihoodself-study

Consider the following density, the mixture of two Gaussian distributions,
\begin{align*}
p(x)= p(k=1) N(x|\mu_1,\sigma^2_1) + p(k=0) N(x|\mu_0,\sigma^2_0) ,
\end{align*}

where $p(k=1)+p(k=0)=\pi_1+\pi_0=1$ and $N(x|\mu,\sigma^2)$ is the density of Gaussian distribution with mean $\mu$ and variance $\sigma^2$.
Parameters of interests are $\pi_0$, $\mu_i$'s and $\sigma^2_i$'s.

This Q & A shows the MLE for the mixture of two Gaussian distributions when the latent variables $K_i$'s are observed.
In this question, suppose we only observe $X_i$'s, and the latent variables $K_i$'s are unobserved.
Classical methods for estimation of these $5$ unknown parameters are EM-algorithm and MCMC sampling, see Hastie et. al. (2009) for details.

Why cannot MLE be implemented for Gaussian mixture model directly?


(Some attempt)

The log-likelihood would be
\begin{align*}
\ln P(x|\theta) = \sum_{i=1}^n \bigg[ (1-k_i) (\ln \pi_0 + \ln N(x_i|\mu_0,\sigma_0^2))+k_i(\ln \pi_1 + \ln N(x_i|\mu_1,\sigma_1^2)) \bigg].
\end{align*}

Best Answer

As @ChristianHenning has pointed out in the comments, the mixing of components makes likelihood analytically intractable. On the other hand, Bayesian methods are not applicable due to the combinatorial explosion of the terms in the expanded likelihood. See, e.g., the discussion and references in these notes. Furthermore, the multimodal nature of the likelihood makes it difficult to use direct numerical maximization or some simpler Monte Carlo algorithms, such as Gibbs sampling or Metropolis-Hastings.