Bias-Variance Tradeoff – Does Bias Increase with Model Complexity?

biasbias-variance tradeoffmachine learning

Does bias eventually increase with model complexity?

Bias-variance tradeoff

Reasoning behind the question:

If I understand it correctly, "bias" measures the discrepancy between the expected value of our model's ($\hat{f}$) prediction and the expected value of the true, underlying model $f$. We can start by creating a simple model with high bias. Bias quickly drops as we train our model with the data since $\hat{f}$ starts to resemble the true $f$ function. However, our model may become too complex, too wiggly compared to the true model. Therefore introducing high bias again.

Edit:

I found a clue in An Introduction to Statistical Learning (2nd edition). FIGURE 2.10. on page 33 together with FIGURE 2.12. on page 36 visually suggest that the bias actually doesn't eventually increase with model complexity.

Flexibility vs bias

The plot on the left shows different models fitted to the data and the black line is a true relationship. The plot on the right corresponds to the data on the left.

Best Answer

I was wondering the same thing. What you have to realize though, is that the bias is defined for the expectation of the prediction for a given x over all possible training sets (D):

\begin{equation} D=\left\{\left(x_{1}, y_{1}\right) \ldots,\left(x_{n}, y_{n}\right)\right\} \end{equation}

\begin{equation} \operatorname{Bias}_{D}[\hat{f}(x ; D)]=\mathrm{E}_{D}[\hat{f}(x ; D)]-f(x) \end{equation}

This means that the expectation ranges over different choices of the training set. In your example (figure 2.10), a specific training set is given. Indeed, the prediction of the flexible model (green line) is way off the real function. Now imagine that you would create multiple training sets, and create multiple of these green lines. The average of these green lines would almost perfectly allign the black line (real model), even better than the yellow line would. The expectation of the prediction of all possible training sets for the more flexibel model is closer to the real model and the bias is therefore lower.