Solved – Different definitions of “curse of dimensionality”

definitionhigh-dimensional

I have read two definitions of the curse of dimensionality: The first seems to be more widespread, (I have seen people refer to it on stats.SE in other questions), the other one I only recently encountered. Here they are:

  1. The curse of dimensionality was coined to indicate that the number of samples needed to estimate an arbitrary function with a given level of accuracy grows exponentially with the number of variables (i.e., dimensions) that it comprises
    [H. Samet, Foundations of Multidimensional and Metric Data Structures, pp486]

  2. The curse of dimensionality is a name given to the situation where all or some of the important features of a dataset sharply concentrate near their median (or mean) values and thus become non-discriminating.
    [V. Pestov, An axiomatic approach to intrinsic dimension of a dataset, Neural Networks 21 (2008) 204–213]

What is the relationship between these definitions? Does the second one agree with the first?

Best Answer

It is not a mathematical object like a derivative that needs to be defined formally without any ambiguity. It is an umbrella term for those two issues encountered when using high dimensional data. The issues are strictly speaking separated and both always present. Depending on your algorithm's vulnerability to them, you can have to deal with both, none, one but not the other and vice versa.

I have read the first definition in slightly different versions. This definition, as quoted here, does not tell us what is meant by estimating the function with a given level of accuracy. Anyway, if you want to know the value of a function at equidistant points, you will need more points, the more dimensions you have. For example 10 points are needed to sample a 1D line from 0 to 1 every 0.1, 100 points are needed for the unit square, 1000 for the cube etc. This affects only algorithms that need sample points in all directions of the feature space.

The second definition is also called "distance concentration effect". Many, but not all algorithms are affected by it. It comes into play when the algorithm uses k nearest neighbors or some other technique relying on distance measures. I don't remember in which canonical reference I picked up the term distance concentration. Here are two papers that discuss the issue in detail though: When is nearest neighbor meaningful, A survey on unsupervised outlier detection in high- dimensional numerical data

In practice, they often happen simultaneously. Strictly speaking, they are always both present. The question is whether your algorithm is vulnerable to both or not. For example, with classic k nearest neighbor learning, you want to have training examples in all regions of the space that your test examples could come from. You are thus vulnerable to the curse of dimensionality in the first sense. You are also vulnerable to the second meaning of the term since you need to compute distances to establish the nearest neighbors which makes less sense if all distances converge to the same number.

Both definitions are in use. I suggest you infer from context which meaning a source uses and be specific which one you mean when using the term.

PS. I've also seen the term used informally when an algorithm's computational complexity increases with the number of dimensions. The term then simply refers to the increasing computational complexity.

Related Question