Justification of proof of law of total expectation

conditional-expectationprobabilityprobability distributions

I am reading about the proof of law of expectation for the discrete type,

$$ \begin{align}
E[E[X|Y]] &= E \left[ \sum_{x} x \cdot P(X = x | Y) \right] \\
&= \sum_y \left[ \sum_{x} x \cdot P(X = x | Y = y) \right] P(Y = y) \tag 2) \\
&= \sum_x x \sum_y P(X = x | Y = y) \cdot P(Y = y) \tag3) \\
&= \sum_x x \sum_y P(X = x \, \text{and} \, Y = y) \tag4) \\
&= \sum_x x \cdot P(X = x) \tag5)\\
&= E[X]
\end{align}
$$

Question: Why does it assume Y=y? And can someone give me justification of each step?

Best Answer

Question: Why does it assume Y=y? And can someone give me justification of each step?

The second line does not assume $Y=y$ any more than the first line assumes $X=x$. Both apply the definition of expectation for discrete random variables.

$$\mathsf E(g(Z))~=~\sum_z g(z)\,\mathsf P(Z{=}z)~~\\\mathsf E(h(W)\mid Z{=}z)~=~\sum_w h(w)~\mathsf P(W{=}w\mid Z{=}z)$$

If you prefer, do it from the outside in.

$\begin{align}\mathsf E(\mathsf E(X\mid Y))&=\sum_y\mathsf E(X\mid Y{=}y)\,\mathsf P(Y{=}y)\\[1ex]&=\sum_y\left(\sum_x x\,\mathsf P(X{=}x\mid Y{=}y)\right)\mathsf P(Y{=}y)\end{align}$

The rest is just distribution, the definition of conditional probability, and the law of total probability.

$\begin{align}\phantom{\mathsf E(\mathsf E(X\mid Y))} &=\sum_x\sum_y x\,\mathsf P(X{=}x\mid Y{=}y)\,\mathsf P(Y{=}y)&&{\text{switching order of summation}\\\text{via commutation and association}} \\[2ex] &=\sum_x x\sum_y \mathsf P(X{=}x\mid Y{=}y)\,\mathsf P(Y{=}y)&&\text{distributing out the common factor}\\[1ex]&=\sum_x x\sum_y\mathsf P(X{=}x\cap Y{=}y)&&\text{definition of conditional probability}\\[1ex] &=\sum_x x\,\mathsf P(X{=}x)&&\text{Law of Total Probability}\\[1ex]&=\mathsf E(X)&&\text{definition of expectation}\end{align}$

Related Question