How is is Bayes Rule being applied to arrive at the following formula

In Sebastian Thrun's Intro to Atificial Intelligence course on Udacity, Problem Set 2, Simple Bayes Net, he asks the following question.

Given a simple Bayes network with a single causal variable, A, and three conditionally independent variables that depend on the causal variable, $X_1, X_2, X_3$, what is $P(A|X_1, X_2, \neg X_3)$.

In the answer video, his first step is

$$P(A|X_1, X_2, \neg X_3) = \frac{P(\neg X_3|A, X_1, X_2)P(A|X_1, X_2)}{P(\neg X_3|X_1, X_2)}$$

He says he arrives there using Bayes Rule, but that is nothing like Bayes Rule. I am able to arrive at the resulting equation by going through several steps, beginning with Bayes Rule but also requiring things like the definition of conditional probability and by transforming joint probabilities into conditional probabilities multiplied by the prior. Can anyone explain (A) Why he seems to think that it is self-evident that he is using Bayes Rule and (B) Why he would bother to transform the equation to produce terms like $P(\neg X_3|A, X_1, X_2)$ for which the value was not given in the question? I was able to easily solve this simply by applying Bayes Rule and evaluating the terms using provided values. ($P(A) = .5, P(X_i|A) = .2, P(X_i|\neg A) = .6$)

Edit:

Here is how I able to arrive at his "first" step:

$\frac{P(X_1, X_2, \neg X_3|A)P(A)}{P(X_1, X_2, \neg X_3)}$ via Bayes Rule

$\frac{\frac{P(X_1, X_2, \neg X_3, A)P(A)}{P(A)}}{P(X_1, X_2, \neg X_3)}$

$\frac{P(X_1, X_2, \neg X_3, A)}{P(X_1, X_2, \neg X_3)}$ via cancellation

$\frac{P(\neg X_3|A, X_1, X_2)P(A, X_1, X_2)}{P(X_1, X_2, \neg X_3)}$ by the definition of conditional probability

$\frac{P(\neg X_3|A, X_1, X_2)P(A|X_1, X_2)P(X_1, X_2)}{P(X_1, X_2, \neg X_3)}$ via the definition of conditional probability

$\frac{P(\neg X_3|A, X_1, X_2)P(A|X_1, X_2)P(X_1, X_2)}{P(\neg X_3|X_1, X_2)P(X_1, X_2)}$ via the definition of conditional probability

$\frac{P(\neg X_3|A, X_1, X_2)P(A|X_1, X_2)}{P(X_3|X_1, X_2)}$ by cancellation

Best Answer

Bayes' Rule says that $P(A|B,I) = \frac{P(B|A,I)P(A|I)}{P(B|I)}$. If you set $A=A$, $B=\neg X_3$, $I=(X_1,X_2$), then you get the identity in the video. Bayes' Rule is often given without $I$, but strictly speaking it is part of the rule: all probabilities are conditioned on something.

$I$ is the background hypothesis on which everything else is conditioned. A common example of the Bayes' Theorem is where someone is tested for a disease that has a base rate of 1%, and they have a test with a 95% accuracy rate that comes back positive. Then $A$ would be "the patient has the disease", $B$ "the test came back positive", and $I$ is everything you're taking as given, such as "the test has a 95% accuracy rate", etc.

Best Answer

Related Solutions

[Math] Bayes’ rule with 3 variables

Compute the posterior probability in a Naive Bayes classifier

Related Question