Solved – Hidden Markov model (forward algorithm) in R

hidden markov modelr

I have studied some of these resources and I know that there is an R package called HMM. Could anybody explain the usefulness of the 'forward algorithm' with a simple example in R?

Finally, I find a good book which is

Hidden Markov Models for Time Series: An Introduction Using R (Chapman & Hall CRC Monographs on Statistics & Applied Probability)
Walter Zucchini, Iain L. MacDonald

Best Answer

Let $\mathbf{X}$ be an observation sequence and $\lambda$ be a Hidden Markov Model (HMM). Then the forward algorithm determines $\textrm{Pr}(\mathbf{X}|\lambda)$, the likelihood of realizing sequence $\mathbf{X}$ from HMM $\lambda$.

In more plain English terms... Let's say you trained up your HMM $\lambda$ and you'd like to see how likely it is that $\lambda$ produced some sequence $\mathbf{X}$. The forward algorithm would do this for you. If you get a relatively high likelihood, there's a good chance that $\lambda$ produced $\mathbf{X}$. If you had two HMMs $\lambda_1$ and $\lambda_2$, you might conclude the one with the higher likelihood is the best model to explain your $\mathbf{X}$.

Let's look at some R code...First we'll initialize an HMM. (I got this from the HMM manuals on CRAN http://cran.r-project.org/web/packages/HMM/HMM.pdf)....

require('HMM')
# Initialise HMM
hmm = initHMM(c("A","B"), c("L","R"), transProbs=matrix(c(.8,.2,.2,.8),2), emissionProbs=matrix(c(.6,.4,.4,.6),2))    
print(hmm)
# Sequence of observations
observations = c("L","L","R","R")

This is an HMM in which has an 80% chance of staying in whatever hidden state it was in at time $t$ when it transitions to time $t+1$. It has two hidden states, A and B. It emits two observations, L and R. The emission probabilities are contained in emissionProbs. We store the observation sequence $\mathbf{X}$ in observations.

logForwardProbabilities = forward(hmm,observations)
forwardProbabilities = exp(logForwardProbabilities)

forwardProbabilities is now a matrix containing the probability of realizing the observation sequence up to time $t$ and ending up in state $i$. Hidden states $i$ are in rows and times $t$ are in columns. If we want the probability of the overall sequence, we use column 4 because it is the last timestep. The resulting 2 element row vector contains the probabilities of ending in A and ending in B. Their sum is the total probability of realizing $\mathbf{X}$.

finalAnswer = sum(forwardProbabilities[,4])
Related Question