Using conditional expectation with MSE function

conditional-expectationprobabilityprobability theoryrandom variables

I'm trying to understand a derivation step in applying conditional expectation to the following starting function:

$f_* = E_{P_{(x,y)}} [(y-f(x))^{2})]$

and how, after applying conditional expectation, you end up with this:

$f_* = E_{P_{(x)}} [E_{P_{(y|x)}}[(y-f(x))^{2}|x]]$

What I think is happening is an application of the law of total expectation, but what I'm not sure is why the subscript gets split up. The variables x and y here are samples drawn from random variables X and Y. If anyone can explain how these two equations are equivalent, I'd greatly appreciate it!

Best Answer

Formally, we are given two random variables $X$, $Y$ and we let $f^* = E[(Y-f(X))^2]$. By the law of iterated expectations, $f^* = E[E[(Y-f(X))^2|X]]$.

Let $\mu$ be a probability kernel such that $\forall A\in \mathcal B(\mathbb R), P(Y\in A|X) = \mu(X,A)$. Then $$E[(Y-f(X))^2|X] = \int (y-f(X))^2d\mu(X,y)$$

Your $E_{P_{(y|x)}}[(y-f(x))^{2}|x]$ actually refers to $\int (y-f(X))^2d\mu(X,y)$ (which is a measurable function of $X$, say $g(X)$).
You get back to $f^*$ by computing $E(g(X))$, which you wrote as $E_{P_{(x)}} [\ldots]$

Related Question