Solved – Expected prediction error – derivation

errorpredictionregression

I am struggling to understand the derivation of the expected prediction error per below (ESL), especially on the derivation of 2.11 and 2.12 (conditioning, the step towards point-wise minimum). Any pointers or links much appreciated.

Below I am reporting the excerpt from ESL pg. 18. The first two equations are, in order, equation 2.11 and 2.12.


Let $X \in \mathbb{R}^p$ denote a real valued random input vector, and $Y \in \mathbb{R}$ a real valued random output variable, with joint distribution $\text{Pr}(X,Y)$. We seek a function $f(X)$ for predicting $Y$ given values of the input $X$. This theory requires a loss function $L(Y,f(X))$ for penalizing errors in prediction, and by far the most common and convenient is squared error loss: $L(Y,f(X))=(Y-f(X))^2$. This leads us to a criterion for choosing $f$,

$$
\begin{split}
\text{EPE}(f) &= \text{E}(Y – f(X))^2\\
& = \int [y – f(x)]^2 \text{Pr}(dx, dy)
\end{split}
$$

the expected (squared) prediction error. By conditioning on $X$, we can write EPE as

$$
\text{EPE}(f) = \text{E}_X \text{E}_{Y|X}([Y-f(X)]^2|X)
$$

and we see that it suffices to minimize EPE point-wise:

$$
f(x) = \text{argmin}_c \text{E}_{Y|X}([Y-c]^2|X)
$$

The solution is

$$
f(x) = \text{E}(Y|X=x)
$$

the conditional expectation, also known as the regression function.

Best Answer

\begin{align*} EPE(f) &= \int [y - f(x)]^2 Pr(dx, dy) \\ &= \int [y - f(x)]^2p(x,y)dxdy \\ &= \int_x \int_y [y - f(x)]^2p(x,y)dxdy \\ &= \int_x \int_y [y - f(x)]^2p(x)p(y|x)dxdy \\ &= \int_x\left( \int_y [y - f(x)]^2p(y|x)dy \right)p(x)dx \\ &= \int_x \left( E_{Y|X}([Y - f(X)]^2|X = x) \right) p(x)dx\\ &= E_{X}E_{Y|X}([Y - f(X)]^2| X = x) \end{align*}

Related Question