Solved – Overfitting in Cross Validation for Hyperparameter Selection

cross-validationoverfitting

I am using 3-fold cross validation for hyperparameter selection of my XGBOOST model. To be specific, I use xgboost.cv for cross validation instead of sklearn. I use random search for hyperparameter search and choose the one or a few set of hyperparameters with the best average score in the hold-out fold of data. The standard deviation of score is approximately the same across all hyperparameter set.

What I observe is that the model is overfitting to the cross validation data. The score of model on testing data is always much worse than the average score on hold-out fold of data. So instead of selecting the set of hyperparameter with the best average score, is there any other criterion I should consider in order to reduce the overfitting?

Best Answer

The average score of your agent on the validation sets in cross validation should not be significantly better than the score on the holdout (final test) set, because for each cross-validation fold the agent is as blind to the data in the validation set as your final model (presumably retrained on all the training data) is to the holdout data.

Is there some quality of your holdout set that is causing your results to be skewed? Are the examples in the training set easier somehow? If you select a different random holdout set, does the problem persist?

So long as the variance in performance across the folds is low, then so long as the holdout set is statistically similar to the training set, the performance should not degrade. Best practice is to select the parameters that yield the reliably (low variance across folds) best-performing (high mean) cross validation score. I am not aware of any alternative criteria nor arguments why one might prefer them.