Probability – Understanding the Chain Rule in Probability Theory

chain ruleprobability

When my teacher told us about the chain rule I found it quite easy, but when I am trying to prove something based on this rule I kind of get confused about what are the allowed forms of this rule. For example, I can't understand why I can say:

$$
p(x,y\mid z)=p(y\mid z)p(x\mid y,z)
$$

I can not understand how one can end up to this equation from the general rule! Can you please help how to think correctly about this rule?


I found this post useful for my question:

Is order of variables important in probability chain rule

Best Answer

$$p(x,y|z) = \frac{p(x,y,z)}{p(z)} = \frac{p(x|y,z)p(y,z)}{p(z)} = p(x|y,z)p(y|z)$$

On the first step we use the definition of conditional probability. On the second step we use the same definition on the numerator to convert the joint probability $p(x,y,z)$ into a conditional $p(x|y,z)$ and a joint $p(y,z)$. Finally, we divide $p(y,z)$ by $p(z)$ applying once again the definition of conditional probability, and we obtain the result.

Another way of looking at it is that you can just ignore variables that are always on the right side of the conditional sign. In that case the expression is just the usual conditional probability:

$$p(x,y) = p(x|y)p(y)$$

You simply condition all of these probabilities on $z$ and you get your original formula.

Related Question