Bias-Variance Tradeoff – Understanding Derivation in Machine Learning

bias-variance tradeoffmachine learningmseunbiased-estimator

I am reading the chapter on the bias-variance tradeoff in The elements of statistical learning and I don't understand the formula on page 29. Let the data arise from a model such that $$ Y = f(x)+\varepsilon$$ where $\varepsilon$ is random number with expected value $\hat{\varepsilon} = E[\epsilon]=0$ and Variance $E[(\varepsilon – \hat\varepsilon)^2]=E[\varepsilon^2]=\sigma^2$. Let the expected value of error of the model be
$$
E[(Y-f_k(x))^2]
$$

where $f_k(x)$ is the prediction of $x$ of our learner $k$. According to the book, the error is
$$
\newcommand{\Bias}{\rm Bias} \newcommand{\Var}{\rm Var}
E[(Y-f_k(x))^2]=\sigma^2+\Bias(f_k)^2+\Var(f_k(x)).
$$

My question is: Why is the bias term not $0$? Developing the formula for the error I see:
\begin{align}
E[(Y-f_k(x))^2] &= \\
E[(f(x)+\varepsilon-f_k(x))^2] &= \\[8pt]
E[(f(x)-f_k(x))^2] + \\
2E[(f(x)-f_k(x))\varepsilon] + E[\varepsilon^2] &= \Var(f_k(x))+2E[(f(x)-f_k(x))\epsilon]+\sigma^2
\end{align}

as $\varepsilon$ is an independent random number $2E[(f(x)-f_k(x))\varepsilon] = 2E[(f(x)-f_k(x))]E[\varepsilon]=0$.

Where I am wrong?

Best Answer

You are not wrong, but you made an error in one step since $E[(f(x)-f_k(x))^2] \ne Var(f_k(x))$. $E[(f(x)-f_k(x))^2]$ is $\text{MSE}(f_k(x)) = Var(f_k(x)) + \text{Bias}^2(f_k(x))$.

\begin{align*} E[(Y-f_k(x))^2]& = E[(f(x)+\epsilon-f_k(x))^2] \\ &= E[(f(x)-f_k(x))^2]+2E[(f(x)-f_k(x))\epsilon]+E[\epsilon^2]\\ &= E\left[\left(f(x) - E(f_k(x)) + E(f_k(x))-f_k(x) \right)^2 \right] + 2E[(f(x)-f_k(x))\epsilon]+\sigma^2 \\ & = Var(f_k(x)) + \text{Bias}^2(f_k(x)) + \sigma^2. \end{align*}

Note: $E[(f_k(x)-E(f_k(x)))(f(x)-E(f_k(x))] = E[f_k(x)-E(f_k(x))](f(x)-E(f_k(x))) = 0.$

Related Question