Solved – Grid Search for hyperparameter and feature selection

cross-validationfeature selectionhyperparametermachine learning

So I need to select my hyperparameters and also my features. A full grid search of the space of hyperparameters and features is too computationally intensive, so what I am doing instead is for each fold of K-fold cross validation:

1) Tune hyperparameters using CV on the training set of the fold, using all features.

2) Select features using those hyperparameters from step 1.

3) Repeat for each fold

4) Final model is constructed on all the data using the N most prevalent features that were selected from each fold of CV. Hyperparameters will be tuned again using all the data in a CV loop.

Would there be a large downside from this method as compared to a full grid search? In essence I am doing a line search in each dimension of free parameters (finding the best value in 1 dimension, holding that constant then finding the best in the next dimension), rather than every single combination of parameter settings.

Best Answer

The most important downside for searching along single parameters instead of optimizing them all together is that you ignore interactions. It is quite common that e.g. more than one parameter influences model complexity. In that case, you need to look at the interaction in order to sucessfully optimize the hyperparameters.

Depending on how large your data set is and how many models you compare, optimization strategies that return the maximum observed performance run into trouble (true for both grid search and your strategy). The reason is that searching through a large number of performance estimates for the maximum "skims" the variance of the performance estimate: you may just end up with a model and train/test split combination that accidentally happens to look good. Even worse, you may get several perfect looking combinations, and the optimization then cannot know which model to choose and thus becomes unstable.

Related Question