Solved – Relationship between bias, variance, and regularization

biasdeep learningoverfittingregularizationvariance

In Goodfellow et al.'s Deep Learning, the authors write on page 222:

"… the model family being trained either (1) excluded the true
data-generating process – corresponding to underfitting and inducing
bias, or (2) matched the true data-generating process, or (3) included
the generating process but also many other possible generating
processes – the overfitting regime where variance rather than bias
dominates the estimation error. The goal of regularization is to take
a model from the third into the second regime."

I'm wondering why the case of underfitting induces bias and why overfitting increases variance. The connection between the two is not clear.

Also, I come from a Bayesian inference background, and I think of regularization as a sort of prior that causes the parameters to shift towards a prior belief. With that interpretation, why can regularization not be used to correct both overfitting and underfitting?

Edit: I'm also confused about where the bias and variance come in. Are they in reference to the parameter values or the predictions? And if the former, how can you even compare in terms of bias or variance a fitted model to the true model (if you knew it) if the fitted model has a different number of parameters than the true model and/or if these parameters are weighting different features computed from the raw predictors? If the latter, how can you be certain that an underfitted model will have high bias (and vice versa)? For example, in figure 2.11 in the ISLR book referenced in @Alex's answer, it looks like, for the linear model, half then predictions lie above the line and half below, so wouldn't the bias be zero?

Best Answer

You would find the answer in the ISLR book Chapter 2.2.1

in short, plz check out pictures enter image description here

and

enter image description here(from the book).

Too flexible model (overfitting) loses generalisation ability by remembering too much noise - increasing variance (second picture - greenish waving non-linear approximation for linear dataset), while inflexible model (underfitting) cannot describe dataset properly (upper picture - beige color line tries to approximate non-linear dataset)

Related Question