Hidden Markov Model – why backward probability is conditional on the current state

conditional probabilityhidden-markov-models

I'm trying to understand hidden Markov model (HMM). Here is the material which I studied.
It states that there are two assumptions in HMM (page 3):

  1. $P( q_i | q_1, …, q_{i-1} ) = P( q_i | q_{i-1} )$
  2. $P(o_i | q_1, …, q_T, o_1, …, o_T) = P(o_i | q_i)$

where $q_i$ denotes the i'th state, $o_i$ denotes the i'th observation, T is the length of sequence.

And in forward-backward algorithm, we need to define the backward probability (page 12, eq A.15):

  • $\beta_t(i) = P(o_{t+1}, …, o_{T} | q_t=i)$

Here explains that backward probability is "the probability of emitting the remaining sequence from t+1 until the end of time after being at hidden state i at time t".

My question is about the assumption (2) and $\beta_t(i)$. Assumption (2) says that the i'th observation only depends on the i'th state. Backward probability only considers $o_{t+1}, …, o_T$, so they only depend on $q_{t+1}, …, q_T$, right? Thus I don't know why conditional on $q_i=i$ is needed in backward probability.
In other words, why can't we state that:

  • $\beta_t(i)=P(o_{t+1}, …, o_{T} | q_t=i) = P(o_{t+1}, …, o_{T})$

Very thanks!

Best Answer

Assumption $1$ should be $$ P\big(q_i\,\big|\,q_1,q_2,\dots,q_{i-1},o_1,o_2,\dots,o_{i-1}\,\big)=P\big(q_i\,\big| ,q_{i-1}\,\big)\ . $$ That is, the distribution of the hidden state at time $\ i\ $ given all the preceding hidden states ,and all the preceding observations, depends only on the preceding state. Assumption $2$ should be $$ P\big(o_i\,\big|\,q_1,q_2,\dots,q_T,o_1,\dots,o_{i-1},o_{i+1},\dots,o_T\,\big)=P\big(o_i\,\big|q_i\,\big)\ . $$ That is, the distribution of the observation at time $\ i\ $ given all the hidden states and all the other observations depends only on the hidden state at time $\ i\ $. In my experience, assumption $2$ is more commonly stated in the form $$ P\big(o_i\,\big|\,q_1,q_2,\dots,q_i,o_1,\dots,o_{i-1}\,\big)=P\big(o_i\,\big|\,q_i\,\big)\ . $$ Although this may appear to be a weaker assumption at first sight, it's in fact equivalent, given assumption $1$.

You're quite correct that given $\ q_1,q_2,\dots,q_T\ $, the distribution of $\ o_{t+1},o_{t+2}, \dots, o_T\ $ depends only on $\ \ q_{t+1},q_{t+2}, \dots, q_T\ $. The key to understanding the dependence of $$ \beta_t(i)=P\big(o_{t+1},o_{t+2},\dots,o_T\,\big| \,q_t=i\,\big) $$ on $\ i\ $, however, is that you're not given any of $\ \ q_{t+1},q_{t+2}, \dots, q_T\ $ in the conditioning event—you're only given $\ q_t\ $and the distribution of the subsequent hidden states, $\ \ q_{t+1},q_{t+2}, \dots, q_T\ $, depends on the value of $\ q_t\ $. The dependence of $\ \beta_t(i)\ $ on $\ i\ $ comes from the fact that the distribution of $\ o_{t+1},o_{t+2}, \dots, o_T\ $ depends on $\ \ q_{t+1},q_{t+2}, \dots, q_T\ $ and these latter states pass their dependence on the value of $\ q_t\ $ through to that distribution.

Related Question