Solved – Curse of dimensionality: why is it a problem that most points are near the edge

high-dimensionalmachine learningpredictive-models

ESL, p. 42 says:

Hence most data points are closer to the boundary of the sample space
than to any other data point. The reason that this presents a problem
is that prediction is much more difficult near the edges of the
training sample. One must extrapolate from neighboring sample points
rather than interpolate between them.

I understand that extrapolation is harder than interpolation. And I understand that if we choose a point to predict and it happens to be near the edge of the sample space, we are more likely to have to extrapolate.

But a large part of the training data set is also near the edge. And the more training points nearby, the better chance we have to interpolate instead of extrapolate. Doesn't the fact that many training points are near the edge reduce or even eliminate the problem that many points we want to predict for are near the edge?

Best Answer

The problem is that "the edge" is not just one place in $d$-dimensional space. "The edge" is a big part of your volume, and the volume of "the edge" grows very quickly. Thus, even if more and more training data points lie near "the edge", the density of training data will still decrease very quickly as $d$ increases.

As a little visualization, consider a two-dimensional square with sides of length 2, and inscribe a circle of radius 1 within it. Let's call the area of the square outside the circle "the edge". "The edge" has area $\frac{4-\pi}{4}$, and it consists of four disjoint pieces.

Now consider the same situation in three dimensions. Then in four. And so forth. You can show that the volume of "the edge" will take up as much of the volume of the $d$-dimensional cube as we want, if we just increase $d$ sufficiently. "The edge" is not disjoint any more for $d\geq 3$, but your training data will get lost in it.

Alternatively, draw $n$ points randomly and uniformly distributed in the $d$-dimensional unit cube, for increasing $d$, and check on the average pairwise distance. You will see that this distance increases quickly with $d$. How quickly will you need to increase $n$ with $d$ for this average distance to stay constant? Very quickly indeed.

Bottom line: "the more training points nearby" is where your argument goes astray. In high dimensions, there are very few points nearby.

Related Question