Solved – Sampling Hidden Markov Model

hidden markov model

I am studying hidden Markov models, but I have some doubts about the inference phase. If I have any observations and I want to know the three parameters that characterize the model, can I use one of the MCMC techniques directly on the observations or do I have to first use the Viterbi algorithm or the forward-backward algorithm on the observations and then use one of MCMC techniques to know the three parameters?

Best Answer

The question is not clear to me. But if you want to sample from HMM, forward sampling can be used. Assuming $X$ are hidden states and $Y$ is observations, and we want to sample $N$ observations.

The steps are:

Sample from $X_1$
Based on the sample we got of $X_1$ and $P(Y_1|X_1)$, sample $Y_1$
Based on the sample we got of $X_1$ and $P(Y_2|X_1)$, sample $X_2$
$\cdots$

Related Solutions

Solved – Hidden Markov model (forward algorithm) in R

Let $\mathbf{X}$ be an observation sequence and $\lambda$ be a Hidden Markov Model (HMM). Then the forward algorithm determines $\textrm{Pr}(\mathbf{X}|\lambda)$, the likelihood of realizing sequence $\mathbf{X}$ from HMM $\lambda$.

In more plain English terms... Let's say you trained up your HMM $\lambda$ and you'd like to see how likely it is that $\lambda$ produced some sequence $\mathbf{X}$. The forward algorithm would do this for you. If you get a relatively high likelihood, there's a good chance that $\lambda$ produced $\mathbf{X}$. If you had two HMMs $\lambda_1$ and $\lambda_2$, you might conclude the one with the higher likelihood is the best model to explain your $\mathbf{X}$.

Let's look at some R code...First we'll initialize an HMM. (I got this from the HMM manuals on CRAN http://cran.r-project.org/web/packages/HMM/HMM.pdf)....

require('HMM')
# Initialise HMM
hmm = initHMM(c("A","B"), c("L","R"), transProbs=matrix(c(.8,.2,.2,.8),2), emissionProbs=matrix(c(.6,.4,.4,.6),2))    
print(hmm)
# Sequence of observations
observations = c("L","L","R","R")

This is an HMM in which has an 80% chance of staying in whatever hidden state it was in at time $t$ when it transitions to time $t+1$. It has two hidden states, A and B. It emits two observations, L and R. The emission probabilities are contained in emissionProbs. We store the observation sequence $\mathbf{X}$ in observations.

logForwardProbabilities = forward(hmm,observations)
forwardProbabilities = exp(logForwardProbabilities)

forwardProbabilities is now a matrix containing the probability of realizing the observation sequence up to time $t$ and ending up in state $i$. Hidden states $i$ are in rows and times $t$ are in columns. If we want the probability of the overall sequence, we use column 4 because it is the last timestep. The resulting 2 element row vector contains the probabilities of ending in A and ending in B. Their sum is the total probability of realizing $\mathbf{X}$.

finalAnswer = sum(forwardProbabilities[,4])

Solved – the difference between the forward-backward and Viterbi algorithms

A bit of background first maybe it clears things up a bit.

When talking about HMMs (Hidden Markov Models) there are generally 3 problems to be considered:

Evaluation problem
- Evaluation problem answers the question: what is the probability that a particular sequence of symbols is produced by a particular model?
- For evaluation we use two algorithms: the forward algorithm or the backwards algorithm (DO NOT confuse them with the forward-backward algorithm).
Decoding problem
- Decoding problem answers the question: Given a sequence of symbols (your observations) and a model, what is the most likely sequence of states that produced the sequence.
- For decoding we use the Viterbi algorithm.
Training problem
- Training problem answers the question: Given a model structure and a set of sequences, find the model that best fits the data.
- For this problem we can use the following 3 algorithms:
  1. MLE (maximum likelihood estimation)
  2. Viterbi training(DO NOT confuse with Viterbi decoding)
  3. Baum Welch = forward-backward algorithm

To sum it up, you use the Viterbi algorithm for the decoding problem and Baum Welch/Forward-backward when you train your model on a set of sequences.

Baum Welch works in the following way.

For each sequence in the training set of sequences.

Calculate forward probabilities with the forward algorithm
Calculate backward probabilities with the backward algorithm
Calculate the contributions of the current sequence to the transitions of the model, calculate the contributions of the current sequence to the emission probabilities of the model.
Calculate the new model parameters (start probabilities, transition probabilities, emission probabilities)
Calculate the new log likelihood of the model
Stop when the change in log likelihood is smaller than a given threshold or when a maximum number of iterations is passed.

If you need a full description of the equations for Viterbi decoding and the training algorithm let me know and I can point you in the right direction.

Best Answer

Related Solutions

Solved – Hidden Markov model (forward algorithm) in R

Solved – the difference between the forward-backward and Viterbi algorithms

Related Question