Solved – Derivation of maximum likelihood for a Gaussian mixture model

expectation-maximizationgaussian mixture distributionmaximum likelihoodnormal distributionprobability

I'm working my way through the derivation of EM in Bishop (p. 435).

I'm stuck trying to derive to MLE for $\mu_k$ for the gaussian mixture model.

Basically I get an extra sum in the numerator.

For those that don't have the book:

The log likelihood for the gaussian mixture model is:

$$ ln\; p(X|\pi,\mu,\Sigma) = \sum_{n=1}^{N} ln \left\{ \sum_{k=1}^K \pi_k N(x_n|\mu_k,\Sigma_k) \right\} $$

When I take derivatives wrt $\mu_k$:

  1. recognise that we're dealing with $ln(f(x))$ and the derivative is $ \frac{f'(x)}{f(x)} $

  2. This gives us:

$$ \sum_{n=1}^{N} \frac{1}{\sum_{k=1}^K \pi_k N(x_n|\mu_k,\Sigma_k)} \times \frac{\partial \sum_{k=1}^K \pi_k N(x_n|\mu_k,\Sigma_k) }{\partial \mu_k} $$

  1. Now only have to solve the differential in the right most term:

$$ \frac{ \partial \sum_{k=1}^K \pi_k N(x_n|\mu_k,\Sigma_k) }{\partial \mu_k} = \sum_{k=1}^K -0.5(2\Sigma^{-1}(x-\mu_k)\times \pi_k N(x_n|\mu_k,\Sigma_k) $$

  1. This leaves me with:

$$ \sum_{n=1}^{N} \frac{ \sum_{k=1}^K \pi_k N(x_n|\mu_k,\Sigma_k) }{\sum_{k=1}^K \pi_k N(x_n|\mu_k,\Sigma_k)} \times -0.5(2\Sigma^{-1}(x-\mu_k)) $$

  1. The solution in the book is:

$$ \sum_{n=1}^{N} \frac{ \pi_k N(x_n|\mu_k,\Sigma_k) }{\sum_{j} \pi_j N(x_n|\mu_j,\Sigma_j)} \times 0.5(2\Sigma^{-1}(x-\mu_k)) $$

How is it that

  1. There's no summation in their numerator?

  2. They've changed what they're summing over (k -> j) ?

  3. They have a positive final term, whereas I have a negative?

Thanks

Best Answer

To avoid any confusion, the summation index and the index of the $\mu$ that you differentiate with should be different. From the beginning, assume the likelihood is written with index $j$ and you want to differentiate it with $\mu_k$:

$$\frac{\partial \sum_{j=1}^K \pi_j N(x_n|\mu_j,\Sigma_j) }{\partial \mu_k}=\frac{ \pi_k\partial N(x_n|\mu_k,\Sigma_k)}{\partial \mu_k}$$ which explains why the answer doesn't have a summation in the numerator.

You'll have a minus in $(x-\mu_k)$, i.e. differentiating wrt $\mu_k$ gives $-1$, and also another minus in $\exp(-(\ldots))$ expression in normal PDF. They'll cancel out each other.