Solved – For hyperparameter tuning with cross validation, is it okay for the fold splits to be same for every hyperparameter trial

cross-validationmachine learningmodel-evaluation

For hyperparameter tuning (random search/ grid search/ bayesian optimization), there are many trials performed for each set of hyperparameters. To evaluate how good a set of hyperparameter is, we can use k fold cross validation which splits the training data into k folds.

Previously, I used to split the training data into k fold and used the same fold splits for all my hyperparameter trials. However, after trying out sklearn Pipelines, it seems that using a pipeline with RandomsearchCV results in a different k fold split for each hyperparameter trial.

I am wondering if it matters at all if we used the same k fold split for all trials or if it is important that we randomized the split for each trial?

Best Answer

It'd actually be better to use the same folds while comparing different models, as you've done initially. If you input the pipeline object into the randomCV object, it should use the same folds. But, if you do the other way around, each run will change the folds as you said. Even in that case, you can fix the folds by fixing the cv argument in the pipeline object.

Related Question