Solved – Derivation of M-step in EM algorithm for mixture of Gaussians

expectation-maximizationgaussian mixture distribution

I am trying to derive the parameter estimation equations for the M-step of the expectation maximization (EM) algorithm for a mixture Gaussians when all Gaussians share the same covariance matrix $\mathbf{\Sigma}$.

Pattern Recognition and Machine Learning by Bishop has a section on EM for Gaussian mixtures, and it includes a derivation of the M-step when all $K$ Gaussians have different covariance matrices $\mathbf{\Sigma_k}$. I think that if I can understand this derivation well, I can modify it to get what I want.

I understand the derivation given by Bishop for the M-step equation for $\mathbf{\mu_k}$. However, the book does not show detailed steps for the derivation of the M-step for $\mathbf{\Sigma_k}$. When I tried to derive it myself by computing $\frac{\partial \mathbf{L}}{\partial \mathbf{\Sigma_k}}$ and setting it to 0, I came across the following derivative that I don't know how to deal with:

$$
\frac{\partial}{\partial \mathbf{\Sigma_k}} \left ( (2\pi)^{-d/2}|\mathbf{\Sigma_k}|^{-1/2}e^{-\frac{1}{2}(x-\mathbf{\mu_k})^T\mathbf{\Sigma_k}^{-1}(x-\mathbf{\mu_k})}\right )
$$

Basically, it's the derivative of the multivariate Gaussian pdf with respect to the covariance matrix. How do I compute this derivative? I've computed the derivative of the logarithm of this function before when studying Gaussian Bayes classifiers, so that makes me think I've made a mistake somewhere.

Best Answer

I've found the answer and I'm posting it for posterity. I mentioned in the question that computing the derivative of the logarithm of the PDF was easier. It turns out that this can be used to compute the derivative of the PDF itself:

$$ \frac{\partial \ln (f)}{\partial \mathbf{\Sigma}_k} = \frac{1}{f} \frac{\partial f}{\partial \mathbf{\Sigma}_k}\\ \Rightarrow \frac{\partial f}{\partial \mathbf{\Sigma}_k} = f \cdot\frac{\partial \ln (f)}{\partial \mathbf{\Sigma}_k} $$

Also, it turns out that taking the derivative of the PDF with respect to $\mathbf{\Sigma}^{-1}$ is easier and leads to the same answer.