Bayesian – Bayesian Updating Explained with Coin Tossing Example

bayesian

I have a question about Bayesian updating. In general Bayesian updating refers to the process of getting the posterior from a prior belief distribution.

Alternatively one could understand the term as using the posterior of the first step as prior input for further calculation.

The below is a simple calculation example. Method a is the standard calculation. Method b uses the posterior output as input prior to calculate the next posterior.

Using method a, we get P(F|HH) = 0.2. Using method b, gives P(F|HH) = 0.05.
My question is as to how far method b is a valid approach ?


Problem: You toss a coin twice, get 2 Heads. What is the probability that the coin is fair, i.e. $Pr(Fair\ coin| HH)$?

Now for the first toss:
$Pr(Fair\ coin| H) = \frac{Pr(Head|Fair)\cdot P(Fair)}{Pr(Head|Fair) \cdot P(Fair)+Pr(Head|Biased) \cdot P(Biased)} = \frac{Pr(H|F)\cdot P(F)}{P(H)} \quad\quad (1)$

Assuming starting prior belief P(Fair) = 0.5, want to find P(F|H) for the first toss

Below are the calculation for the intermediate steps:

$P(H|F)= {n \choose x} \theta^{x}(1-\theta)^{n-x} = {1 \choose 1} 0.5^{1}(0.5)^{0}= 0.5$

$P(H)= P(H|F) \cdot P(F)+ P(H|Biased) \cdot P(Biased)=(0.5 \cdot 0.5) +(1 \cdot 0.5) = 0.75$

(Note: P(H|Biased) = 1 because assuming an extreme example with Heads on both sides of the coin, the probability of getting Heads with a biased coin = 1 (makes calculation easy))

Hence, plugging into (1), we get :

$Pr(F| H) =\frac{Pr(H|F)\cdot P(F)}{P(H)} = \frac{0.5 \cdot 0.5}{0.75} = 0.33$


Now, we toss the coin again and get another H. To calculate $Pr(F| HH) $
, we

a) continue using P(Fair)=0.5

$Pr(F|HH) = \frac{Pr(HH|F)\cdot P(F)}{Pr(HH|F) \cdot P(F)+Pr(HH|Biased) \cdot P(Biased)} = \frac{Pr(HH|F)\cdot P(F)}{P(HH)} \quad\quad (2)$

$P(HH|F)= {n \choose x} \theta^{x}(1-\theta)^{n-x} = {2 \choose 2} 0.5^{2}(0.5)^{0}= 0.25$

$P(HH)= P(HH|F) \cdot P(F)+ P(HH|Biased) \cdot P(Biased)=(0.25 \cdot 0.5) +(1 \cdot 0.5) = 0.625$

Hence, plugging into (2),
$Pr(F|HH) =\frac{Pr(HH|F)\cdot P(F)}{P(HH)} = \frac{0.25 \cdot 0.5}{0.625} = 0.2$


Alternatively, what if we calculate $Pr(F| HH) $ by using

b) our updated belief P(Fair)=0.33 which we got from Pr(F|H) in the first step

In this case,

$P(HH|F)= {n \choose x} \theta^{x}(1-\theta)^{n-x} = {2 \choose 2} 0.33^{2}(1-0.33)^{0}= 0.1089$

$P(HH)= P(HH|F) \cdot P(F)+ P(HH|Biased) \cdot P(Biased)=(0.1089 \cdot 0.33) +(1 \cdot 0.67) = 0.705937$

Hence, plugging into (2),
$Pr(F|HH) =\frac{Pr(HH|F)\cdot P(F)}{P(HH)} = \frac{0.1089 \cdot 0.33}{0.705937} = 0.05091$


Using method a, we get P(F|HH) = 0.2. Using method b, gives P(F|HH) = 0.05.
My question is as to how far method b is a valid approach ?

Best Answer

Your approach b) is wrong: both the single step updating, in which all data are used together to update the prior and arrive at the posterior, and the Bayesian sequential (also called recursive) updating, in which data are used one at a time to obtain a posterior which becomes the prior of the successive iteration, must give exactly the same result. This is one of the pillars of Bayesian statistics: consistency.

Your error is simple: once you updated the prior with the first sample (the first "Head"), you only have one remaining sample to include in your likelihood in order to update the new prior. In formulas:

$$P(F|HH) =\frac{P(H|H,F)P(F|H)}{P(H|H)} $$

This formula is just Bayes' theorem, applied after the first event "Head" has already happened: since conditional probabilities are probabilities themselves, Bayes' theorem is valid also for probabilities conditioned to the event "Head", and there's nothing more to prove really . However, I found that some times people don't find this result self-evident, thus I give a slightly long-winded proof.

$$P(F|HH) =\frac{P(HH|F)P(F)}{P(HH)}= \frac{P(H|H,F)P(H|F)P(F)}{P(HH)}$$

by the chain rule of conditional probabilities. Then, multiplying numerator and denominator by $P(H)$, you get

$$\frac{P(H|H,F)P(H|F)P(F)}{P(HH)}=\frac{P(H|H,F)P(H|F)P(F)P(H)}{P(HH)P(H)}=\frac{P(H|H,F)P(H)}{P(HH)}\frac{P(H|F)P(F)}{P(H)}=\frac{P(H|H,F)}{P(H|H)}\frac{P(H|F)P(F)}{P(H)}=\frac{P(H|H,F)P(F|H)}{P(H|H)}$$

where in the last step I just applied Bayes' theorem. Now:

$$P(H|H,F)= P(H|F)=0.5$$

This is obvious: conditionally on the coin being fair (or biased), we are modelling the coin tosses as i.i.d.. Applying this same idea to the denominator, we get:

$$P(H|H)= P(H|F,H)P(F|H)+P(H|B,H)P(B|H)=P(H|F)P(F|H)+P(H|B)P(B|H)=0.5\cdot0.\bar{3}+1\cdot0.\bar{6}$$

Finally:

$$P(F|HH) =\frac{P(H|H,F)P(F|H)}{P(H|H)}=\frac{0.5\cdot0.\bar{3}}{0.5\cdot0.\bar{3}+1\cdot0.\bar{6}}=0.2$$

QED


That's it: have fun using Bayesian sequential updating, it's very useful in a lot of situations! If you want to know more, there are many resources on the Internet: this is quite good.