Solved – Feature selection before SVM

feature selectionsvm

I have a simple but difficult question. Does feature selection before SVM help? I have a data set that has ~1100 features but a lot of these are redundant data / uncorrelated data. Can someone give me a reason why it should / should not help? Thanks a lot.

Best Answer

The SVM is an approximate implementation of a theoretical bound on the generalisation performance that is independent of the dimensionality of the feature space. This means that there is a good reason to suggest that performing feature selection might not make the performance of the classifier any better.

The reason that the SVM works is because it uses regularisation (like ridge regression) to avoid over-fitting, so provided you set the regularisation parameter $C$ properly (e.g. using cross-validation), the performance ought to be good without feature selection.

The thing that is often not mentioned about feature selection is that it can easily make performance worse. The reason for this is that the more choices about the model that are made by optimising some statistic evaluated over the training sample, the more likely you are to over-fit the training sample, and feature selection often ends up making many more choices about the model (worst case $2^d$ where $d$ is the number of parameters). In his monograph on feature subset selection for regression[0], Millar suggests that if you are primarily interested in generalisation performance, then use ridge regression instead and don't do any feature selection. This is in accord with my experience, I think the reason is that it is more difficult to over-fit with one continuous parameter tuned using cross-validation than choosing the best of the $2^n$ combinations of features.

  • [0] Millar, A. (2002). Subset Selection in Regression, Second Editon. Chapman & Hall/CRC Monographs on Statistics & Applied Probability.
Related Question