I am currently studying this paper In which i am having some problems understanding the purpose of the forward-backwards algorithm.

First of all why even have both forward and backwards?

It seems to me that after one have computed the forward variable:

$$\alpha_{t+1}(j) = [\sum_{i=1}^{N}\alpha_t(i)a_{ij}] b_j(o_{t+1}) \qquad, 1 \le t \le T-1 \quad, 1 \le j \le N $$

$$P(O|\lambda) = \sum_{i=1}^N\alpha_T(i)

$$

Where the backwards variable is given as such:

$$\beta_t(i) = a_{ij}b_j(O_{t+1})\beta_{t+1}(j), \qquad t = T-1 , T-2 , \ldots , 1$$

Both look pretty similar…

One have the information needed being the probability of the seeing oberservation sequence $O$ in model $\lambda$ the other doesn'.. does the backward even solve the evaluation problem, or is it only the forward which does it.

The usage makes sense in the baum-welch algorithm.. But is that the only purpose?

Another things that I am confused in why all initial probabilities for the backward variable is initialised to be equal 1… Has all the states equal opportunity to be terminating states?

## Best Answer

Both the forward and backward algorithms will give you the same final value, the full probability of the sequence given the model. It is the scores/probabilities in the dynamic programming arrays that are different. As you point out these values are used in the Baum-Welch re-estimation procedure. They are also used when calculating posterior probabilities. If all you require is the single full probability value you will only need the forward (or backward) algorithm.

To your second question, the values with which to initialize the final column of the dynamic programming array do not necessarily need to be 1's. You can model the transition probabilities to the (silent) end state, which will produce an HMM that can generate sequences of varying length.