Probability – Understanding the Law of Total Variance as Pythagorean Theorem

conditional-expectationintuitionvariance

Assume $X$ and $Y$ have finite second moment. In the Hilbert space of random variables with second finite moment (with inner product of $T_1,T_2$ defined by $E(T_1T_2)$, $||T||^2=E(T^2)$), we may interpret $E(Y|X)$ as the projection of $Y$ onto the space of functions of $X$.

We also know that Law of Total Variance reads
$$Var(Y)=E(Var(Y|X)) + Var(E(Y|X))$$

Is there a way to interpret this law in terms of the geometric picture above? I have been told that the law is the same as Pythagorean Theorem for the right-angled triangle with sides $Y, E(Y|X), Y-E(Y|X)$. I understand why the triangle is right-angled, but not how the Pythagorean Theorem is capturing the Law of Total Variance.

Best Answer

I assume that you are comfortable with regarding the right-angled triangle as meaning that $E[Y\mid X]$ and $Y - E[Y\mid X]$ are uncorrelated random variables. For uncorrelated random variables $A$ and $B$, $$\operatorname{var}(A+B) = \operatorname{var}(A) + \operatorname{var}(B),\tag{1}$$ and so if we set $A = Y - E[Y\mid X]$ and $B = E[Y\mid X]$ so that $A+B = Y$, we get that $$\operatorname{var}(Y) = \operatorname{var}(Y-E[Y\mid X]) + \operatorname{var}(E[Y\mid X]).\tag{2}$$ It remains to show that $\operatorname{var}(Y-E[Y\mid X])$ is the same as $E[\operatorname{var}(Y\mid X)]$ so that we can re-state $(2)$ as $$\operatorname{var}(Y) = E[\operatorname{var}(Y\mid X)] + \operatorname{var}(E[Y\mid X])\tag{3}$$ which is the total variance formula.

It is well-known that the expected value of the random variable $E[Y\mid X]$ is$E[Y]$, that is, $E\biggr[E[Y\mid X]\biggr] = E[Y]$. So we see that $$E[A] = E\biggr[Y - E[Y\mid X]\biggr] = E[Y] - E\biggr[E[Y\mid X]\biggr] = 0,$$ from which it follows that $\operatorname{var}(A) = E[A^2]$, that is, $$\operatorname{var}(Y-E[Y\mid X]) = E\left[(Y-E[Y\mid X])^2\right].\tag{4}$$ Let $C$ denote the random variable $(Y-E[Y\mid X])^2$ so that we can write that $$\operatorname{var}(Y-E[Y\mid X]) = E[C].\tag{5}$$ But, $E[C] = E\biggr[E[C\mid X]\biggr]$ where $E[C\mid X] = E\biggr[(Y-E[Y\mid X])^2{\bigr\vert} X\biggr].$ Now, given that $X = x$, the conditional distribution of $Y$ has mean $E[Y\mid X=x]$ and so $$E\biggr[(Y-E[Y\mid X=x])^2{\bigr\vert} X=x\biggr] = \operatorname{var}(Y\mid X = x).$$ In other words, $E[C\mid X = x] = \operatorname{var}(Y\mid X = x)$ so that the random variable $E[C\mid X]$ is just $\operatorname{var}(Y\mid X)$. Hence, $$E[C] = E\biggr[E[C\mid X]\biggr] = E[\operatorname{var}(Y\mid X)], \tag{6}$$ which upon substitution into $(5)$ shows that $$\operatorname{var}(Y-E[Y\mid X]) = E[\operatorname{var}(Y\mid X)].$$ This makes the right side of $(2)$ exactly what we need and so we have proved the total variance formula $(3)$.

Related Question