Solved – The relationship between expectation-maximization and majorization-minimization

I wonder about the relationship between two methods called expectation-maximization (EM) and majorization-minimization. One of them, the EM algorithm can be used for finding the mode of the likelihood or posterior distribution by introducing the latent variables. In the book of Bishop, BRML, he claims that, the new distribution with latent variables are easy to optimize. Majorization minimization methods underlie the same idea, that is, one can introduce another cost function which iteratively converges to the original cost function which is hard to maximize.

Let us take following model: $$ y = x + \eta$$ where $\eta$ is a Gaussian random variable. Maximum likelihood estimation in this model corresponds to the optimization over a quadratic term $\|y -x \|_2^2$. This functional is easy to optimize, however we can find more complicated models which are not easy to optimize, hence introduce an alternative cost function which iteratively converges to the original one (Majorization minimisation). The EM algorithm achieve similar goals but introduce probabilistic notions such as latent variables.

I wonder if these two algorithms are equivalent for some special cases. Or if they are completely different, what is the relationship between them? Thanks!

Best Answer

The general idea of Minorant Maximization algorithms is: (a) Approximate target function with a dominated function. (b) Climb in approximating function. (c) Return to (a), approximating at new coordinate.

The "flavor" of the MM is given by the approximating method, and the climbing method. EM is a particular instance in which a convex combination of the target function at different points is used or approximation. This approximation is dominated by the target function via Jenessen's-Inequality (JI). Convexity alone still allows an endless amount of approximating functions. It turns out that (given regularity assumptions on the target function) there is a single convex combination of the target function, which is also tangent at the approximating point. This function, happens to have a nice probabilistic interpretation, and it know as the E step of the EM.

In conclusion, the EM is indeed a particular instance of the MM, although to the best of my knowledge, first came EM, then then generalization.

Best Answer

Related Solutions

Expectation Maximization – Why It Guarantees Convergence to a Local Optimum

Machine Learning – Maximizing Likelihood vs. Minimizing Cost: A Comprehensive Comparison

Related Question