I am studying hidden Markov models, but I have some doubts about the inference phase. If I have any observations and I want to know the three parameters that characterize the model, can I use one of the MCMC techniques directly on the observations or do I have to first use the Viterbi algorithm or the forward-backward algorithm on the observations and then use one of MCMC techniques to know the three parameters?
Solved – Sampling Hidden Markov Model
hidden markov model
Related Solutions
Let $\mathbf{X}$ be an observation sequence and $\lambda$ be a Hidden Markov Model (HMM). Then the forward algorithm determines $\textrm{Pr}(\mathbf{X}|\lambda)$, the likelihood of realizing sequence $\mathbf{X}$ from HMM $\lambda$.
In more plain English terms... Let's say you trained up your HMM $\lambda$ and you'd like to see how likely it is that $\lambda$ produced some sequence $\mathbf{X}$. The forward algorithm would do this for you. If you get a relatively high likelihood, there's a good chance that $\lambda$ produced $\mathbf{X}$. If you had two HMMs $\lambda_1$ and $\lambda_2$, you might conclude the one with the higher likelihood is the best model to explain your $\mathbf{X}$.
Let's look at some R code...First we'll initialize an HMM. (I got this from the HMM manuals on CRAN http://cran.r-project.org/web/packages/HMM/HMM.pdf)....
require('HMM')
# Initialise HMM
hmm = initHMM(c("A","B"), c("L","R"), transProbs=matrix(c(.8,.2,.2,.8),2), emissionProbs=matrix(c(.6,.4,.4,.6),2))
print(hmm)
# Sequence of observations
observations = c("L","L","R","R")
This is an HMM in which has an 80% chance of staying in whatever hidden state it was in at time $t$ when it transitions to time $t+1$. It has two hidden states, A and B. It emits two observations, L and R. The emission probabilities are contained in emissionProbs
. We store the observation sequence $\mathbf{X}$ in observations
.
logForwardProbabilities = forward(hmm,observations)
forwardProbabilities = exp(logForwardProbabilities)
forwardProbabilities
is now a matrix containing the probability of realizing the observation sequence up to time $t$ and ending up in state $i$. Hidden states $i$ are in rows and times $t$ are in columns. If we want the probability of the overall sequence, we use column 4 because it is the last timestep. The resulting 2 element row vector contains the probabilities of ending in A and ending in B. Their sum is the total probability of realizing $\mathbf{X}$.
finalAnswer = sum(forwardProbabilities[,4])
A bit of background first maybe it clears things up a bit.
When talking about HMMs (Hidden Markov Models) there are generally 3 problems to be considered:
Evaluation problem
- Evaluation problem answers the question: what is the probability that a particular sequence of symbols is produced by a particular model?
- For evaluation we use two algorithms: the forward algorithm or the backwards algorithm (DO NOT confuse them with the forward-backward algorithm).
Decoding problem
- Decoding problem answers the question: Given a sequence of symbols (your observations) and a model, what is the most likely sequence of states that produced the sequence.
- For decoding we use the Viterbi algorithm.
Training problem
- Training problem answers the question: Given a model structure and a set of sequences, find the model that best fits the data.
- For this problem we can use the following 3 algorithms:
- MLE (maximum likelihood estimation)
- Viterbi training(DO NOT confuse with Viterbi decoding)
- Baum Welch = forward-backward algorithm
To sum it up, you use the Viterbi algorithm for the decoding problem and Baum Welch/Forward-backward when you train your model on a set of sequences.
Baum Welch works in the following way.
For each sequence in the training set of sequences.
- Calculate forward probabilities with the forward algorithm
- Calculate backward probabilities with the backward algorithm
- Calculate the contributions of the current sequence to the transitions of the model, calculate the contributions of the current sequence to the emission probabilities of the model.
- Calculate the new model parameters (start probabilities, transition probabilities, emission probabilities)
- Calculate the new log likelihood of the model
- Stop when the change in log likelihood is smaller than a given threshold or when a maximum number of iterations is passed.
If you need a full description of the equations for Viterbi decoding and the training algorithm let me know and I can point you in the right direction.
Best Answer
The question is not clear to me. But if you want to sample from HMM, forward sampling can be used. Assuming $X$ are hidden states and $Y$ is observations, and we want to sample $N$ observations.
The steps are: