Expectation Maximization – Why It Guarantees Convergence to a Local Optimum

convergenceexpectation-maximizationmissing data

I have read a couple of explanations of EM algorithm (e.g. from Bishop's Pattern Recognition and Machine Learning and from Roger and Gerolami First Course on Machine Learning). The derivation of EM is ok, I understand it. I also understand why the algorithm coverges to something: at each step we improve the result and the likelihood is bounded by 1.0, so by using a simple fact (if a function increases and is bounded then it converges) we know that the algorithm converges to some solution.

However, how do we know it is a local minimum? At each step we are considering only one coordinate (either latent variable or parameters), so we might miss something, like that the local minimum requires moving by both coordinates at once.

This I believe is a similar problem to that of general class of hill climbing algorithms, which EM is an instance of. So for a general hill climbing algorithm we have this problem for function f(x, y) = x*y. If we start from (0, 0) point, then only by considering both directions at once we are able to move upwards from 0 value.

Best Answer

EM is not guaranteed to converge to a local minimum. It is only guaranteed to converge to a point with zero gradient with respect to the parameters. So it can indeed get stuck at saddle points.