[Math] Minimizing the expectation of the loss function

conditional-expectationexpected valuemean square erroroptimizationprobability

So i was reading Elements of Statistical Learning and found this in the Statistical Decision Theory part.I

Did not understand it.

The expected (squared) prediction error . By conditioning on $X$, we can
write EPE as
$$EPE(f) = E_x E_{y|x} ([Y − f(X)]^2|X) \qquad (2.11)$$
and we see that it suffices to minimize EPE pointwise:
$$f(x) = \operatorname{arg\,minc} E_{y|x} ([Y − c]^2|X = x)$$

enter image description here

Can someone explain me what exactly happened here with proper mathematical formulae and some intuition as well. Is it to assume that conditioning over $x$ implies assuming $x$ to be constant in some sense. And if possible please try to explain using a density and the definition of expectation.

Best Answer

Given $X=x$, you can always form a function which can have arbitrary value $f(x)$ at $x$. Note that the function need not be differentiable at $x$. That's why the author replaced $f(x)$ with $c$ which and minimized the expression over $c$ to get the optimal value of $f(x)$ that minimizes the loss when $X=x$.

Please comment if you want me to show how the authors derived the optimal value $f(x) = E(Y|X=x)$ from the minimization in second expression.

Related Question