Solved – Modern machine learning and the bias-variance trade-off

bias-variance tradeoffinterpolationmachine learning

I stumbled upon the following paper Reconciling modern machine learning practice
and the bias-variance trade-off and do not completely understand how they justify the double descent risk curve (see below), desribed in their paper.

In the introduction they say:

By considering larger function classes, which contain more candidate
predictors compatible with the data, we are able to find interpolating
functions that have smaller norm and are thus "simpler". Thus
increasing function class capacity improves performance of classifiers.

From this I can understand why the test risk decreases as a function of the function class capacity.

What I don't understand then with this justification, however, is why the test risk increases up to the interpolation point and then decreases again. And why is it exactly at the interpolation point that the number of data points $n$ is equal to the function parameter $N$?

I would be happy if someone could help me out here.

Best Answer

The main point about Belkin's Double Descent is that, at the interpolation threshold, i.e. the least model capacity where you fit training data exactly, the number of solutions is very constrained. The model has to "stretch" to reach the interpolation threshold with a limited capacity.

When you increase capacity further than that, the space of interpolating solutions opens-up, actually allowing optimization to reach lower-norm interpolating solutions. These tend to generalize better, and that's why you get the second descent on test data.

Best Answer

Related Solutions

Solved – How to detect noisy datasets (bias and variance trade-off)

Solved – How to measure bias variance trade-off

Related Question