Conditional Probability vs Bayes Rule – Key Differences Explained

bayesianconditional probabilityprobability

I know the Bayes rule is derived from the conditional probability. But intuitively, what is the difference? The equation looks the same to me. The nominator is the joint probability and the denominator is the probability of the given outcome.

This is the conditional probability: $P(A∣B)=\frac{P(A \cap B)}{P(B)}$

This is the Bayes' rule: $P(A∣B)=\frac{P(B|A) * P(A)}{P(B)}$.

Isn't $P(B|A)*P(A)$ and $P(A \cap B)$ the same? When $A$ and $B$ are independent, there is no need to use the Bayes rule, right?

Best Answer

OK, now that you have updated your question to include the two formulas:

$$P(A\mid B) = \frac{P(A\cap B)}{P(B)} ~~ \text{provided that } P(B) > 0, \tag{1}$$ is the definition of the conditional probability of $A$ given that $B$ occurred. Similarly, $$P(B\mid A) = \frac{P(B\cap A)}{P(A)} = \frac{P(A\cap B)}{P(A)} ~~ \text{provided that } P(A) > 0, \tag{2}$$ is the definition of the conditional probability of $B$ given that $A$ occurred. Now, it is true that it is a trivial matter to substitute the value of $P(A\cap B)$ from $(2)$ into $(1)$ to arrive at $$P(A\mid B) = \frac{P(B\mid A)P(A)}{P(B)} ~~ \text{provided that } P(A), P(B) > 0, \tag{3}$$ which is Bayes' formula but notice that Bayes's formula actually connects two different conditional probabilities $P(A\mid B)$ and $P(B\mid A)$, and is essentially a formula for "turning the conditioning around". The Reverend Thomas Bayes referred to this in terms of "inverse probability" and even today, there is vigorous debate as to whether statistical inference should be based on $P(B\mid A)$ or the inverse probability (called the a posteriori or posterior probability).

It is undoubtedly as galling to you as it was to me when I first discovered that Bayes' formula was just a trivial substitution of $(2)$ into $(1)$. Perhaps if you have been born 250 years ago, you (Note: the OP masqueraded under username AlphaBetaGamma when I wrote this answer but has since changed his username) could have made the substitution and then people today would be talking about the AlphaBetaGamma formula and the AlphaBetaGammian heresy and the Naive AlphaBetaGamma method$^*$ instead of invoking Bayes' name everywhere. So let me console you on your loss of fame by pointing out a different version of Bayes' formula. The Law of Total Probability says that $$P(B) = P(B\mid A)P(A) + P(B\mid A^c)P(A^c) \tag{4}$$ and using this, we can write $(3)$ as

$$P(A\mid B) = \frac{P(B\mid A)P(A)}{P(B\mid A)P(A) + P(B\mid A^c)P(A^c)}, \tag{5}$$ or more generally as $$P(A_i\mid B) = \frac{P(B\mid A_i)P(A_i)}{P(B\mid A_1)P(A_1) + P(B\mid A_2)P(A_2) + \cdots + P(B\mid A_n)P(A_n)}, \tag{6}$$ where the posterior probability of a possible "cause" $A_i$ of a "datum" $B$ is related to $P(B\mid A_i)$, the likelihood of the observation $B$ when $A_i$ is the true hypothesis and $P(A_i)$, the prior probability (horrors!) of the hypothesis $A_i$.


$^*$ There is a famous paper R. Alpher, H. Bethe, and G. Gamow, "The Origin of Chemical Elements", Physical Review, April 1, 1948, that is commonly referred to as the $\alpha\beta\gamma$ paper.

Related Question