Hyperparameter Tuning – Is Hyperparameter Tuning Required for Feature Selection Using Wrapper Methods?

classificationfeature selectionhyperparametermachine learningneural networks

I am working on binary classification with class proportion of 77:23 (977 records)

Currently, I am exploring the feature selection approaches and came across methods like below

a) Featurewiz

b) Sequential forward and backward feature selection

c) Borutapy

d) RFE etc

Now all the above methods use a ML model to find the best performing features.

Now my question is

a) Do we have to use the best parameters for getting the best features?

b) If yes, then once we select the features, do we have to again do a gridsearchCV and find the best parameters to fit and predict?

Or do you think it is suffice to just use default parameters for feature selection and for model building we can use best parameters?

Best Answer

Both feature selection and hyper-parameter (HP) optimization are sub-optimal. With infinite compute power, we could have done both at the same time. But we can't search the whole space, so we have approximate approaches.

Do we have to use the best parameters for getting the best features?

Typical practice is to use a good enough estimator. Usually, the best HPs are found with the complete feature set may not be the same as the ones found with a feature subset. It's a chicken-egg problem. So, you don't have to. These are all approximate approaches.

You can also use the features found by the above heuristics and include them in your HP search, e.g. include your best three feature sets and search best HPs together with these sets as well.

Related Question