Hyperparameter Tuning – Is Hyperparameter Tuning Required for Feature Selection Using Wrapper Methods?

classificationfeature selectionhyperparametermachine learningneural networks

I am working on binary classification with class proportion of 77:23 (977 records)

Currently, I am exploring the feature selection approaches and came across methods like below

a) Featurewiz

b) Sequential forward and backward feature selection

c) Borutapy

d) RFE etc

Now all the above methods use a ML model to find the best performing features.

Now my question is

a) Do we have to use the best parameters for getting the best features?

b) If yes, then once we select the features, do we have to again do a gridsearchCV and find the best parameters to fit and predict?

Or do you think it is suffice to just use default parameters for feature selection and for model building we can use best parameters?

Best Answer

Both feature selection and hyper-parameter (HP) optimization are sub-optimal. With infinite compute power, we could have done both at the same time. But we can't search the whole space, so we have approximate approaches.

Do we have to use the best parameters for getting the best features?

Typical practice is to use a good enough estimator. Usually, the best HPs are found with the complete feature set may not be the same as the ones found with a feature subset. It's a chicken-egg problem. So, you don't have to. These are all approximate approaches.

You can also use the features found by the above heuristics and include them in your HP search, e.g. include your best three feature sets and search best HPs together with these sets as well.

Related Solutions

Solved – Best methods of feature selection for nonparametric regression

Unless identification of the most relevant variables is a key aim of the analysis, it is often better not to do any feature selection at all and use regularisation to prevent over-fitting. Feature selection is a tricky procedure and it is all too easy to over-fit the feature selection criterion as there are many degrees of freedom. LASSO and elastic net are a good compromise, the achieve sparsity via regularisation rather than via direct feature selection, so they are less prone to that particular form of over-fitting.

Solved – RFE number of features with hyperparameter fine tuning within cros-validation

You can easily search both parameters in a single GridSearchCV:

param_grid = {'n_features': [1, 2, 3], 'estimator__C': [0.1, 0.001]}

This will be "inefficient" in that it will rebuild RFE from scratch for 1, 2, 3 features. The most efficient way would be running RFECV several times for different values of C and let RFECV do the cross-validation. That's not very elegant, though, and being able to do this efficiently with GridSearchCV would be ideal. I have been wanting to work for this apparently since 2013: https://github.com/scikit-learn/scikit-learn/issues/1626

Best Answer

Related Solutions

Solved – Best methods of feature selection for nonparametric regression

Solved – RFE number of features with hyperparameter fine tuning within cros-validation

Related Question