Solved – Why is the expectation maximization algorithm used

expectation-maximization

From what little I know the EM algorithm can be used to find the maximum likelihood when setting to zero the partial derivatives with respect to the parameters of the likelihood gives a set of equations that cannot be solved analytically. But is the EM algorithm needed instead of using some numerical technique to try to find a maximum of the likelihood with respect to the constraint of the set of equations mentioned.

Best Answer

The question is legit and I had the same confusion when I first learnt the EM algorithm.

In general terms, the EM algorithm defines an iterative process that allows to maximize the likelihood function of a parametric model in the case in which some variables of the model are (or are treated as) "latent" or unknown.

In theory, for the same purpose, you can use a minimization algorithm to numerically find the maximum of the likelihood function for all parameters. However in real situation this minimization would be:

  1. much more computationally intensive
  2. less robust

A very common application of the EM method is fitting a mixture model. In this case considering the variable that assign each sample to one of the component as "latent" variables the problem is greatly simplified.

Lets look at an example. We have N samples $s = \{s_i\}$ extracted from a mixture of 2 normal distributions. To find the parameters without EM we should minimize:

$$-\log \mathcal{L}(x,\theta) = -\log\Big[ a_1 \exp\Big( \frac{(x-\mu_1)^2}{2\sigma_1^2}\Big) + a_2 \exp\Big(\frac{(x-\mu_2)^2}{2\sigma_2^2}\Big) \Big]$$

On the contrary, using the EM algorithm, we first "assign" each sample to a component (E step) and then fit (or maximize the likelihood of) each component separately (M step). In this example the M-step is simply a weighted mean to find $\mu_k$ and $\sigma_k$. Iterating over these two steps is a simpler and more robust way to minimize $-\log \mathcal{L}(x,\theta)$.

Related Question