Dimensionality Curse – Does it Affect Some Models More Than Others?

high-dimensionalk nearest neighbourk-meansneural networkssvm

The places I have been reading about dimensionality curse explain it in conjunction to kNN primarily, and linear models in general. I regularly see top rankers in Kaggle using thousands of features on dataset that hardly has 100k data points. They mainly use Boosted trees and NN, among others. That many features seem too high and I feel they would get affected by dimensionality curse. But that doesn't seem to be the case as these models make them top the competitions. So, back to my original question – do some models get affected by dimensionality curse more than others?

Specifically, I am interested in the following models (only because these are the ones that I am aware of/used):

  • Linear and Logistic Regression
  • Decision Trees/RandomForest/Boosted Trees
  • Neural Networks
  • SVM
  • kNN
  • k-means clustering

Best Answer

In general, the curse of dimensionality makes the problem of searching through a space much more difficult, and effects the majority of algorithms that "learn" through partitioning their vector space. The higher the dimensionality of our optimization problem the more data we need to fill the space that we are optimizing over.

Generalized Linear Models

Linear models suffer immensely from the curse of dimensionality. Linear models partition the space in to a single linear plane. Even if we are not looking to directly compute $$\hat{\beta} = (X^{'}X)^{-1}X^{'}y$$ the problem posed is still very sensitive to collinearity, and can be considered "ill conditioned" without some type of regularization. In very high dimensional spaces, there is more than one plane that can be fitted to your data, and without proper type of regularization can cause the model to behave very poorly. Specifically what regularization does is try to force one unique solution to exist. Both L1 and squared L2 regularization try to minimize the weights, and can be interpreted selecting the model with the smallest weights to be the most "correct" model. This can be thought of as a mathematical formulation of Occams Razor.

Decision Trees
Decision trees also suffer from the curse of dimensionality. Decision trees directly partition the sample space at each node. As the sample space increases, the distances between data points increases, which makes it much harder to find a "good" split.

Random Forests
Random Forests use a collection of decision trees to make their predictions. But instead of using all the features of your problem, individual trees only use a subset of the features. This minimizes the space that each tree is optimizing over and can help combat the problem of the curse of dimensionality.

Boosted Tree's
Boosting algorithms such as AdaBoost suffer from the curse of dimensionality and tend to overffit if regularization is not utilized. I won't go in depth, because the post Is AdaBoost less or more prone to overfitting? explains the reason why better than I could.

Neural Networks
Neural networks are weird in the sense that they both are and are not impacted by the curse of dimensionality dependent on the architecture, activations, depth etc. So to reiterate the curse of dimensionality is the problem that a huge amount of points are necessary in high dimensions to cover an input space. One way to interpret deep neural networks is to think of all layers expect the very last layer as doing a complicated projection of a high dimensional manifold into a lower dimensional manifold, where then the last layer classifies on top of. So for example in a convolutional network for classification where the last layer is a softmax layer, we can interpret the architecture as doing a non-linear projection onto a smaller dimension and then doing a multinomial logistic regression (the softmax layer) on that projection. So in a sense the compressed representation of our data allows us to circumvent the curse of dimensionality. Again this is one interpretation, in reality the curse of dimensionality does in fact impact neural networks, but not at the same level as the models outlined above.

SVM
SVM tend to not overffit as much as generalized linear models due to the excessive regularization that occurs. Check out this post SVM, Overfitting, curse of dimensionality for more detail.

K-NN, K-Means

Both K-mean and K-NN are greatly impacted by the curse of dimensionality, since both of them use the L2 squared distance measure. As the amount of dimensions increases the distance between various data-points increases as well. This is why you need a greater amount of points to cover more space in hopes the distance will be more descriptive.

Feel free to ask specifics about the models, since my answers are pretty general. Hope this helps.