Solved – EM to Variational EM in LDA

inferencemachine learningtopic-models

Why exactly, when learning hidden variables distribution in LDA (Latent Dirichlet Allocation), one cannot use the EM (Expectation Maximization) algorithm and have to resort to a variational EM typically (or sampling method)?

Variational inference was the original method proposed for parameter inference in the 2003 paper from Blei and Al.

Best Answer

I do not see why you cannot use EM for LDA. To apply EM to LDA: In the E-step, you fix $\theta$ (the topic distribution of the document) and $\phi$ (the word distribution under a topic) and compute the distribution $q(z)=p(z|x,\theta,\phi)$ ($z$ is the topic assignment of each word). In the M-step, you update $\theta$ and $\phi$ to optimize the expected log likelihood, where the expectation is taken based on $q(z)$.

Of course, if you use less-than-one hyperparameters for the Dirichlet distributions, then you cannot use EM because in the M-step the expected log likelihood would contain Dirichlet over $\theta$ and $\phi$ that may have multiple modes (i.e., some of the parameters of the Dirichlet may be less than 1, leading to infinite probability density at the corners/edges of the simplex, as shown below).