Markov Process – Understanding Emission Probability in HMM Definition

hidden markov modelmarkov-process

This is rather basic question. I was going through Speech and Language Processing by Jurafsky and Martin. In the book, they define a Hidden Markov Model (HMM) as follows:

An HMM is specified by the following components:

  • $Q = q_1q_2 …q_N$ : a set of N states
  • $A = a_{11} …a_{i j} …a_{NN}$ : a transition probability matrix $A$ , each > $a_{ij}$ representing the probability of moving from state $i$ to state $j$, s.t. $\sum_{j=1}^Na_{ij}=1 \quad ∀i$
  • $O = o_1o_2 …o_T$ : a sequence of $T$ observations, each one drawn from a vocabulary $V =
    v_1, v_2,…, v_V$
  • $B = b_i(o_t)$ : a sequence of observation likelihoods, also called emission probabilities, each expressing the probability of an observation $o_t$ being generated
    from a state $q_i$
  • $π = π_1,π_2,…,π_N$: an initial probability distribution over states. $π_i$ is the probability that the Markov chain will start in state $i$. Some states $j$ may have $π_j = 0$, meaning that they cannot be initial states. Also, $\sum_{i=1}^n\pi_i=1$

My doubt is shouldn't emission probabilities $B$ sum to 1? That is, shouldnt it be the case that $\sum_{i=1}^n b_i(o_t)=1$ (or maybe $\sum_{t=1}^{n_t} b_i(o_t)=1$). If not, why? If yes, why doesn't the book specify either of these?

Best Answer

You're right that a probability distribution should sum to 1, but not in the way that you wrote it. The sum of the probability mass over all events should be 1.

In other words, $\sum_{k=1}^{V} b_i\left(v_k\right) = 1$. At every position in the sequence, the probability of emitting a given symbol given that you're in state $i$ is what's summed up to make a normalized distribution. This is true, whether you're at time $t=1$ or at time $t=T$ or any time in between.

For each possible state $q_i$, you'll have a different summing-to-one distribution, which is conditioned on you being in state $q_i$.

Related Question