Problem seeing how $P(X_1=x_1|X_2=x_2) = E[P(X_1=x_1|X_3) | X_2 = x_2]$

conditional probabilityexpected valueprobability

I am reading a text where they claim that
$$P(X_1=x_1|X_2=x_2) = \\
​E[P(X_1=x_1|X_3) | X_2 = x_2] =\\
\sum_{x_3} P(X_1 = x_1|X_2 = x_2, X_3 = x_3)P(X_3=x_3|X_2=x_2) \quad (1.),$$

where $X_1, X_2, X_3$ are discrete random variables and $P(A, B)$ is notation for $P(A \land B)$.

I can see why
$$P(X_1=x_1|X_2=x_2) = \sum_{x_3} P(X_1 = x_1|X_2 = x_2, X_3 = x_3)P(X_3=x_3|X_2=x_2)$$ since this is a consequence of that the events $X_3 = x_3$ for all $x_3$ in the range of $X_3$ partitions the sample space, combined with the law of total probability and the definition of conditional probability. But I don't see how the first or second equality holds in $(1.)$. It seems as I have the wrong interpretation of

$$​E[P(X_1=x_1|X_3) | X_2 = x_2] \quad (2.) $$

I am looking to understand where my understanding of this expression $(2.)$ is wrong. This is how I interpret it:

The random variables are defined over some sample space $S$ so that for example $X_1 = x_1$ is just another way of writing $X_1(s) = x_1$ for some $s \in S$. Define the random variable
$$X_4: S \to [0, 1], \quad X_4(s) = P(X_1=x_1|X_3 = X_3(s)) \quad (3.)$$
Then we have that $(2.)$ can be written as
$$
E[P(X_1=x_1|X_3) | X_2 = x_2] = E[X4|X_2 = x_2] = \sum_{x_4} x_4 P(X_4=x_4 | X_2 = x_2)
$$

Here I am stuck. One reason is that I don't understand how to go from summing over $x_4$ to summing over $x_3$ as is done in $(1.)$. Have I started correctly and how can I proceed?

Best Answer

I don't think we can equate $P(X_1=x_1|X_2=x_2)$ and $E\Big(P(X_1=x_1|X_3)|X_2=x_2\Big)$. To see an example of this, suppose that $(X_1,X_2,X_3)\sim p$ where $p$ is the pmf defined below: $$p(1,1,1)=0.2 \\ p(1,1,2)=0.1 \\ p(1,2,1)=0.01 \\ p(1,2,2)=0.13 \\ p(2,1,1)=0.06 \\ p(2,1,2)=0.11 \\ p(2,2,1)=0.09 \\ p(2,2,2)=0.3 $$ Assume $p(x,y,z)=0$ for all $(x,y,z)\notin \{1,2\}^3$. It's not difficult to verify $$P(X_1=1|X_2=2)=\frac{14}{53}$$ On the other hand, we get with LOTUS that $$\begin{eqnarray*}E\Big(P(X_1=1|X_3)|X_2=2\Big) &=& \sum_{a,b\in \{1,2\}}P(X_1=1|X_3=b)P(X_1=a,X_3=b|X_2=2) \\ &=& \sum_{b\in \{1,2\}}P(X_1=1|X_3=b) \sum_{a\in \{1,2\}}P(X_1=a,X_3=b|X_2=2) \\ &=& \sum_{b\in \{1,2\}}P(X_1=1|X_3=b)P(X_3=b|X_2=2) \\ &=& \frac{7}{12}\cdot \frac{10}{53}+ \frac{23}{64} \cdot \frac{43}{53} \\ &\neq & \frac{14}{53} \end{eqnarray*}$$ If you wish to carry out this computation without the aid of LOTUS (as you started to do) we would need first to establish the conditional pmf of $P(X_1=1|X_3)$ given $X_2=2$. A brief calculator exercise reveals the random variable $P(X_1=1|X_3)$ is supported on the set $\Big\{\frac{7}{12},\frac{23}{64}\Big\}$ and satisfies $$P\Big(P(X_1=1|X_3)=\frac{7}{12}\Big|X_2=2\Big)=P(X_3=1|X_2=2)=\frac{10}{53}$$ $$P\Big(P(X_1=1|X_3)=\frac{23}{64}\Big|X_2=2\Big)=P(X_3=2|X_2=2)=\frac{43}{53}$$ Finally, $$\begin{eqnarray*}E\Big(P(X_1=1|X_3)|X_2=2\Big)&=&\sum_{t\in\big\{\frac{7}{12},\frac{23}{64}\big\}}t P\Big(P(X_1=1|X_3)=t|X_2=2\Big) \\ &=& \frac{7}{12}\cdot \frac{10}{53}+ \frac{23}{64} \cdot \frac{43}{53} \\ &\neq& \frac{14}{53} \end{eqnarray*}$$ As I mentioned in the comments, we may certainly conclude that $$P(X_1=1|X_2=2)=E\Big(P(X_1=1|X_2,X_3)|X_2=2\Big)$$

Related Question