Gradient descent inside the expectation-maximization (EM) algorithm

gradient descentnumerical optimizationoptimizationstatistics

I am feeling super uncertain about how much I can play around with the EM algorithm. Here is my question:

In the EM algorithm, during the M-step, one attempts to find a parameter value, $\theta$, that maximizes $Q$.

Sometimes the function of $Q$ with respect to that parameter value is continuous, differentiable and concave, but does not have a closed form solution for the stationary points and requires some form of numerical techniques, like gradient descent or Newton-Raphson method, to find the stationary points.

Am I correct to conclude that one does not break the convergences of the EM algorithm if you use these numerical methods to optimize $Q$ with respect to some parameter value $\theta$.

In fact, am I correct to conclude that one could use any optimization technique in the E step for all the parameters we wish to optimize for?

Best Answer

Yes, you can use any optimization technique, including numerical, in the M (maximization) step.

In fact, you needn't even maximize; as long as the M step improves the objective function Q. The Generalized EM Algorithm, is described for example, in section 7 of "The EM Algorithm As a Lower Bound Optimization Technique", by Rave Harpaz and Robert Haralick. I have substituted your notation for the paper's notation in the below quote from that section.

In the M-step, the parameters $\theta$ were chosen as the value for which the lower bound of the objective Q was maximized. While this ensures the greatest increase in Q and subsequently the log-likelihood, it is possible to relax the requirement of maximization to one of simply increasing Q. This approach to simply increase and not necessarily maximize is known as the Generalized Expectation Maximization (GEM) algorithm.