Solved – Hidden Markov models and expectation maximization algorithm

expectation-maximizationhidden markov modelmarkov-process

Can somebody clarify how hidden Markov models are related to expectation maximization? I have gone through many links but couldn't come up with a clear view.

Thanks!

Best Answer

The EM algorithm (expectation maximization) is a general algorithm for optimization of the likelihood function in cases where the model is specified probabilistically in terms of an observed and an unobserved (latent) component. HMMs (hidden Markov models) are models of this form because they have an unobserved component, the hidden states, and the actual observations are often called the emissions in the HMM terminology. Hence, HMMs form a class of models for which the EM algorithm can be useful.

In generel, if the model consists of two components $(X,Y)$, which we assume take values in a finite space for simplicity, and if the probabilistic model specification consists of the joint point probabilities $p_{\theta}(x,y)$, parametrized by $\theta$, then the likelihood when observing only $X = x$ is $$L_x(\theta) = \sum_{y} p_{\theta}(x,y).$$ Though the sum looks innocent, it is not. For HMMs the sum will be over all possible transitions between the hidden states, which quickly becomes a formidable number when the length of the observed sequence grows. Fortunately there are algorithms for HMMs (forward-backward) for fast computation of the likelihood, and the likelihood could then, in principle, be plugged into any general purpose optimization algorithm for maksimum-likelihood estimation of $\theta$. The alternative is the EM-algorithm. This is an algorithm that iteratively alternates between

  • the E-step, which is a computation of a conditional expectation given the observed $x$ under the current estimate of $\theta$
  • the M-step, which is a maximization

The EM-algorithm makes most sense if the two steps above can be implemented in a computationally efficient way, e.g. when we have closed form expressions for the conditional expectation and the maximization.

Historically, the general EM-algorithm is credited to Dempster, Laird and Rubin, who proved in their 1977 paper, among other things, that the algorithm leads to a sequence of parameters with monotonely increasing likelihood values. They also coined the term "EM-algorithm". Interestingly, the EM-algorithm for HMMs was described already in 1970 by Baum et al., and is also often referred to as the Baum-Welch algorithm in the HMM literature (I don't know precisely what Welch did ...).