Solved – the difference between the forward-backward and Viterbi algorithms

algorithmsforward-backwardhidden markov modelviterbi-algorithm

I want to know what the differences between the forward-backward algorithm and the Viterbi algorithm for inference in hidden Markov models (HMM) are.

Best Answer

A bit of background first maybe it clears things up a bit.

When talking about HMMs (Hidden Markov Models) there are generally 3 problems to be considered:

  1. Evaluation problem

    • Evaluation problem answers the question: what is the probability that a particular sequence of symbols is produced by a particular model?
    • For evaluation we use two algorithms: the forward algorithm or the backwards algorithm (DO NOT confuse them with the forward-backward algorithm).
  2. Decoding problem

    • Decoding problem answers the question: Given a sequence of symbols (your observations) and a model, what is the most likely sequence of states that produced the sequence.
    • For decoding we use the Viterbi algorithm.
  3. Training problem

    • Training problem answers the question: Given a model structure and a set of sequences, find the model that best fits the data.
    • For this problem we can use the following 3 algorithms:
      1. MLE (maximum likelihood estimation)
      2. Viterbi training(DO NOT confuse with Viterbi decoding)
      3. Baum Welch = forward-backward algorithm

To sum it up, you use the Viterbi algorithm for the decoding problem and Baum Welch/Forward-backward when you train your model on a set of sequences.


Baum Welch works in the following way.

For each sequence in the training set of sequences.

  1. Calculate forward probabilities with the forward algorithm
  2. Calculate backward probabilities with the backward algorithm
  3. Calculate the contributions of the current sequence to the transitions of the model, calculate the contributions of the current sequence to the emission probabilities of the model.
  4. Calculate the new model parameters (start probabilities, transition probabilities, emission probabilities)
  5. Calculate the new log likelihood of the model
  6. Stop when the change in log likelihood is smaller than a given threshold or when a maximum number of iterations is passed.

If you need a full description of the equations for Viterbi decoding and the training algorithm let me know and I can point you in the right direction.

Related Question