I wake up in a random class and hear 6 biology-related words. How certain should I be that I’m in Biology class

bayes-theorembayesianbayesian networkprobability

Suppose I'm sleeping in some class. I wake up and I hear 6 topic-specific words that seem related to biology. I'm asked to guess whether I'm in Biology class? How confident should I be? I think this can be presented with the following Bayesian network, with one parent node and 6 children nodes.
enter image description here

Suppose that $$P(word_1|biology)=0.6$$$$P(word_2|biology)=0.6$$$$P(word_3|biology)=0.7$$$$P(word_4|biology)=0.7$$$$P(word_5|biology)=0.8$$$$P(word_6|biology)=0.8$$

Suppose that I think there's some chance I could hear these words in some other class, such as chemistry. Hence, let $P(word_i|\neg biology)$ be $P(word_i|biology)-0.1$:

$$P(word_1|\neg biology)=0.5$$$$P(word_2|\neg biology)=0.5$$$$P(word_3|\neg biology)=0.6$$$$P(word_4|biology)=0.6$$$$P(word_5|\neg biology)=0.7$$$$P(word_6|\neg biology)=0.7$$

My prior credence of being in biology class is $0.1$. How do I update to form a posterior after hearing these 6 words?


Upon hearing word 1, using Bayes rule I update as follows:

$$P(class=bio|word_1)=\frac{p(word_1|bio)*p(bio)}{p(word_1|bio)*p(bio)+p(word_1|\neg bio)*p(\neg bio)}=\frac{0.6*0.1}{(0.1*0.6)+(0.5*0.9)} \approx 0.1176$$

Do I keep updating like this sequentially for each word, plugging in the previous posterior as the next prior? Such as,

$$P(class=bio|word_2)=\frac{p(word_2|bio)*p(bio)}{p(word_2|bio)*p(bio)+p(word_2|\neg bio)*p(\neg bio)}=\frac{0.6*0.1176}{(0.1176*0.6)+(0.5*0.8824)} \approx 0.1378$$

And so on… Is that correct?

Best Answer

Yes, your reasoning is correct... the posterior probability for each update becomes the prior probability for the next. (This is one of the nice things about the Bayesian approach.) Note that each update can be written as $$ P' = \frac{p_w P}{p_w P + q_w (1-P)}=\frac{p_w P}{q_w + (p_w - q_w)P}=\frac{P}{\alpha_w +(1-\alpha_w)P}, $$ where $\alpha_w=P(w|\neg bio) \div P(w|bio)$ is $5/6$ or $6/7$ or $7/8$ for your words. It's easy to check that the result after all six words comes out to $P\approx 0.221453$, and that this is independent of the order in which you do the updates.

In light of the other answer, it's worth noting that this is the same as the result from a single update with $\alpha=\prod_w \alpha_w=25/64$... that is, it's the same as treating the words as independent. This is exactly what the diagram says: the six words are independent, given the class. The advantage of the first approach, though, is that you can update your credence in an online fashion as you hear the words... allowing you to, say, take out your textbook as soon as you're sufficiently confident you're in the right class.

Related Question