Solved – How to select the final model with elastic net feature selection, cross validation and SVM

cross-validationelastic netfeature selectionmodel selectionsvm

I have a dataset of some 100 samples, each with >10,000 features, some of which highly correlated. Here's what I am doing currently.

Split the data set into three folds.
For each fold,
2.1 Run elastic net for 100 values of lambda. (this returns a nfeatures x 100 matrix)
2.2 Take a union of all non-zero weights. (returning a nfeatures x 1 vector)
Select features corresponding to the non-zero weights returned from 2.2
Use these features for training and testing SVM.

My problem is that in step 3, for each fold I get a different set of features. How do I get one final model out of this? One final list of relevant features? Can I take an intersection of the selected features in step 3 for all folds? Features that are selected in all three folds would appear to be the most stable/significant. Can I do this, or is it cheating?

Best Answer

By "for each fold I get a different set of features", I suspect you mean that you are using a k-fold cross-validation procedure to estimte the performance of the model. The thing to remember about cross-validation is that you are estimating the perfomance of a method of constructing a model, not the model itself. So you form the final model, just use the procedure used in each fold of the cross-validation, but using all of the data, rather than (k-1)/k of it.

I am not sure there is much to be gained from using an elastic net to choose the features for an SVM. The SVM is an approximate implementation of a bound on the generalisation performance, which is independent of the dimensionality of the input space, so with a good choice of C, it should work just fine in a 10,000 dimensional feature space (this is what I have found via practical experience as well).

As a sort of belt-and-braces approach, you could use bootstrapped SVMs, and use the out-of-bag error to estimate performance. If you have a linear SVM, then you can combine all of the bootstrapped SVMs into a single linear model after training, so there is no performance problem in operation. Likewise an average of the elastic net models will probably work pretty well also.

Related Solutions

Solved – Recursive feature selection with cross-validation in the caret package (R)

My understanding is the "consensus ranking" is independent of the choosing of the "best" set of predictors. The rfe function finds the best predictors but as far as I know the only place to find the actual algorithm is to go through the source code. I think the author is implying that a "consensus ranking" is up to the user to do something with the variables. For example, running the code example at Feature selection: Using the caret package and showing the results of the random forest predictors:

profile.1$results

  Variables  Accuracy     Kappa  AccuracySD    KappaSD
1         1 0.9968370 0.9936464 0.007392163 0.01485547
2         2 0.9968746 0.9937256 0.009326189 0.01866587
3         3 0.9963217 0.9926185 0.009537048 0.01908711
4         4 0.9971857 0.9943537 0.006409197 0.01284846
5         5 0.9968659 0.9937105 0.007209709 0.01445173
6         6 0.9977209 0.9954207 0.006048051 0.01213925
7        20 0.9954924 0.9909603 0.009642686 0.01930148

profile.2$results

 Variables  Accuracy     Kappa AccuracySD    KappaSD
1         1 0.6483312 0.2995335 0.04698551 0.09230506
2         2 0.7723877 0.5454866 0.03916581 0.07729696
3         3 0.8274992 0.6532635 0.04604503 0.09299738
4         4 0.8388603 0.6762275 0.04361517 0.08828418
5         5 0.8309978 0.6605690 0.04846354 0.09755719
6         6 0.8242424 0.6474883 0.04556598 0.09109094
7        20 0.8005472 0.6018126 0.04871103 0.09703959

profile.3$results

 Variables  Accuracy      Kappa AccuracySD    KappaSD
1         1 0.3192818 0.05197699 0.05773080 0.07663863
2         2 0.3933106 0.13560101 0.05459624 0.07598374
3         3 0.4594806 0.22122750 0.05119101 0.06953943
4         4 0.6771564 0.53076000 0.12127578 0.17285038
5         5 0.6536151 0.49190799 0.07879014 0.11242260
6         6 0.6070402 0.42205418 0.07241226 0.10155747
7        20 0.5046387 0.25116903 0.05869522 0.07952462

profile.4$results

  Variables  Accuracy       Kappa AccuracySD    KappaSD
1         1 0.5154641 0.036353403 0.05806695 0.11057134
2         2 0.5117129 0.032926630 0.06592773 0.12742427
3         3 0.5198731 0.046944007 0.04739288 0.09231161
4         4 0.5187570 0.045917813 0.05237265 0.10100463
5         5 0.5118155 0.032686407 0.05595381 0.10829322
6         6 0.5105693 0.032829544 0.05683679 0.10436906
7        20 0.4972180 0.007899334 0.04944846 0.08724467

A consensus could be calculated on the four results using accuracy or some combinations of metrics.

Solved – Report coefficients in elastic net regression after cross validation

Regarding your question about post-selection confidence intervals for the LASSO look for a paper by Tibshirani et al. titled "Exact Post-Selection Inference for Sequential Regression Procedures". As for implementation, look for a R package called selectiveInference (works with glmnet). There is also a Python implementation, you can find information on both at the github page https://github.com/selective-inference/ Hope it helps.

Best Answer

Related Solutions

Solved – Recursive feature selection with cross-validation in the caret package (R)

Solved – Report coefficients in elastic net regression after cross validation

Related Question