Solved – Expectation of squared error

least squaresmachine learning

In machine learning, we let $X$ be a real-valued input vector and $Y$ be a real number output, with joint distribution $P(X,Y)$. We are looking for a function $f(X)$ for predicting $Y$ given the values of $X$. In most of machine learning problems, we give penalty for each time that the prediction is wrong. The loss function is defined as follows: $L(Y,f(X))=(Y-f(X))^2$.

The expected square prediction error is given as:

$$EPE(f) = E(Y-f(X))^2 = \displaystyle\int [y-f(x)]^2P(dx,dy). $$

The book I'm reading says: By conditioning on $X$, we can write EPE as $EPE(f) = E_XE_{Y|X}((Y-f(X))^2|X)$, and eventually getting a solution that is $f(x) = E(Y|X=x)$.

I'm not sure I follow what happened in these steps. Please help me with the derivation.
Thanks

Best Answer

The expected value of $(Y-a)^2$ is given by $$E[(Y-a)^2] = E\left[(Y-E[Y])^2\right] + \left(E[Y]-a\right)^2 = \operatorname{var}(Y) + \left(E[Y]-a\right)^2.\tag{1}$$ Regarded as a function of $a$, the smallest possible value of $E[(Y-a)^2]$ is $\operatorname{var}(Y)$, and this minimum value occurs exactly when $a$ equals $E[Y]$.

Now, when it is known that $X$ has value $x$, all the above applies in exactly the same way except that we are now looking at the conditional EPE $E[(Y-a)^2\mid X = x]$ instead of the unconditional EPE $E[(Y-a)^2]$. Conditioning everything on $X=x$, the formula $(1)$ becomes $$\begin{align} E[(Y-a)^2\mid X=x] &= E\left[\left(Y-E[Y\mid X=x]\right)^2\mid X=x\right] + \left(E[Y\mid X=x]-a\right)^2\\ &= \operatorname{var}(Y\mid X=x) + \left(E\left[Y\mid X=x\right]-a\right)^2 \end{align}$$ Thus, the minimum value of $E[(Y-a)^2\mid X =x]$ occurs when $a$ is chosen to be the conditional mean $E[Y\mid X = x]$ of $Y$ instead of the unconditional mean $E[Y]$. The minimum conditional EPE is, of course, the conditional variance $\operatorname{var}(Y\mid X=x)$ of $Y$ given $X = x$ instead of the unconditional variance $\operatorname{var}(Y)$.

Best Answer

Related Solutions

Solved – What machine learning techniques can, once trained, generate prediction despite some missing inputs

Solved – Quadratic loss function implying conditional expectation

Related Question