In general, the curse of dimensionality makes the problem of searching through a space much more difficult, and effects the majority of algorithms that "learn" through partitioning their vector space. The higher the dimensionality of our optimization problem the more data we need to fill the space that we are optimizing over.
Generalized Linear Models
Linear models suffer immensely from the curse of dimensionality. Linear models partition the space in to a single linear plane. Even if we are not looking to directly compute $$\hat{\beta} = (X^{'}X)^{-1}X^{'}y$$ the problem posed is still very sensitive to collinearity, and can be considered "ill conditioned" without some type of regularization. In very high dimensional spaces, there is more than one plane that can be fitted to your data, and without proper type of regularization can cause the model to behave very poorly. Specifically what regularization does is try to force one unique solution to exist. Both L1 and squared L2 regularization try to minimize the weights, and can be interpreted selecting the model with the smallest weights to be the most "correct" model. This can be thought of as a mathematical formulation of Occams Razor.
Decision Trees
Decision trees also suffer from the curse of dimensionality. Decision trees directly partition the sample space at each node. As the sample space increases, the distances between data points increases, which makes it much harder to find a "good" split.
Random Forests
Random Forests use a collection of decision trees to make their predictions. But instead of using all the features of your problem, individual trees only use a subset of the features. This minimizes the space that each tree is optimizing over and can help combat the problem of the curse of dimensionality.
Boosted Tree's
Boosting algorithms such as AdaBoost suffer from the curse of dimensionality and tend to overffit if regularization is not utilized. I won't go in depth, because the post Is AdaBoost less or more prone to overfitting?
explains the reason why better than I could.
Neural Networks
Neural networks are weird in the sense that they both are and are not impacted by the curse of dimensionality dependent on the architecture, activations, depth etc. So to reiterate the curse of dimensionality is the problem that a huge amount of
points are necessary in high dimensions to cover an input space. One way to interpret deep neural networks is to think of all layers expect the very last layer as doing a complicated projection of a high dimensional manifold into a lower dimensional manifold, where then the last layer classifies on top of. So for example in a convolutional network for classification where the last layer is a softmax layer, we can interpret the architecture as doing a non-linear projection onto a smaller dimension and then doing a multinomial logistic regression (the softmax layer) on that projection. So in a sense the compressed representation of our data allows us to circumvent the curse of dimensionality. Again this is one interpretation, in reality the curse of dimensionality does in fact impact neural networks, but not at the same level as the models outlined above.
SVM
SVM tend to not overffit as much as generalized linear models due to the excessive regularization that occurs. Check out this post SVM, Overfitting, curse of dimensionality for more detail.
K-NN, K-Means
Both K-mean and K-NN are greatly impacted by the curse of dimensionality, since both of them use the L2 squared distance measure. As the amount of dimensions increases the distance between various data-points increases as well. This is why you need a greater amount of points to cover more space in hopes the distance will be more descriptive.
Feel free to ask specifics about the models, since my answers are pretty general. Hope this helps.
It is not a mathematical object like a derivative that needs to be defined formally without any ambiguity. It is an umbrella term for those two issues encountered when using high dimensional data. The issues are strictly speaking separated and both always present. Depending on your algorithm's vulnerability to them, you can have to deal with both, none, one but not the other and vice versa.
I have read the first definition in slightly different versions. This definition, as quoted here, does not tell us what is meant by estimating the function with a given level of accuracy. Anyway, if you want to know the value of a function at equidistant points, you will need more points, the more dimensions you have. For example 10 points are needed to sample a 1D line from 0 to 1 every 0.1, 100 points are needed for the unit square, 1000 for the cube etc. This affects only algorithms that need sample points in all directions of the feature space.
The second definition is also called "distance concentration effect". Many, but not all algorithms are affected by it. It comes into play when the algorithm uses k nearest neighbors or some other technique relying on distance measures. I don't remember in which canonical reference I picked up the term distance concentration. Here are two papers that discuss the issue in detail though:
When is nearest neighbor meaningful,
A survey on unsupervised outlier detection in high-
dimensional numerical data
In practice, they often happen simultaneously. Strictly speaking, they are always both present. The question is whether your algorithm is vulnerable to both or not. For example, with classic k nearest neighbor learning, you want to have training examples in all regions of the space that your test examples could come from. You are thus vulnerable to the curse of dimensionality in the first sense. You are also vulnerable to the second meaning of the term since you need to compute distances to establish the nearest neighbors which makes less sense if all distances converge to the same number.
Both definitions are in use. I suggest you infer from context which meaning a source uses and be specific which one you mean when using the term.
PS. I've also seen the term used informally when an algorithm's computational complexity increases with the number of dimensions. The term then simply refers to the increasing computational complexity.
Best Answer
That is precisely the unexpected behavior of distances in high dimensions. For 1 dimension, you have the interval [0, 1]. 10% of the points are in a segment of length 0.1. But what happens as the dimensionality of the feature space increases?
That expression is telling you that if you want to have that 10% of the points for 5 dimensions, you need to have a length for the cube of 0.63, in 10 dimensions of 0.79 and 0.98 for 100 dimensions.
As you see, for increasing dimensions you need to look further away to get the same amount of points. Even more, is telling you that most of the points are at the boundary of the cube as the number of dimensions increase. Which is unexpected.