Solved – How to select the final model with elastic net feature selection, cross validation and SVM

cross-validationelastic netfeature selectionmodel selectionsvm

I have a dataset of some 100 samples, each with >10,000 features, some of which highly correlated. Here's what I am doing currently.

  1. Split the data set into three folds.

  2. For each fold,
    2.1 Run elastic net for 100 values of lambda. (this returns a nfeatures x 100 matrix)
    2.2 Take a union of all non-zero weights. (returning a nfeatures x 1 vector)

  3. Select features corresponding to the non-zero weights returned from 2.2

  4. Use these features for training and testing SVM.

My problem is that in step 3, for each fold I get a different set of features. How do I get one final model out of this? One final list of relevant features? Can I take an intersection of the selected features in step 3 for all folds? Features that are selected in all three folds would appear to be the most stable/significant. Can I do this, or is it cheating?

Best Answer

By "for each fold I get a different set of features", I suspect you mean that you are using a k-fold cross-validation procedure to estimte the performance of the model. The thing to remember about cross-validation is that you are estimating the perfomance of a method of constructing a model, not the model itself. So you form the final model, just use the procedure used in each fold of the cross-validation, but using all of the data, rather than (k-1)/k of it.

I am not sure there is much to be gained from using an elastic net to choose the features for an SVM. The SVM is an approximate implementation of a bound on the generalisation performance, which is independent of the dimensionality of the input space, so with a good choice of C, it should work just fine in a 10,000 dimensional feature space (this is what I have found via practical experience as well).

As a sort of belt-and-braces approach, you could use bootstrapped SVMs, and use the out-of-bag error to estimate performance. If you have a linear SVM, then you can combine all of the bootstrapped SVMs into a single linear model after training, so there is no performance problem in operation. Likewise an average of the elastic net models will probably work pretty well also.