Solved – rule of thumb for using standard deviation of k-fold cross-validation scores to pick the best model

cross-validationmodel-evaluationstandard deviation

Let's say I'm comparing 60 different model hyperparameter value combinations using 10-fold cross-validation. It's tempting to simply select the hyperparameter combination whose mean accuracy is highest across the folds. However, should one make use of the standard deviation of the accuracies when deciding on the best hyperparameter combination? If so, any particularly rule of thumb (e.g. go with the hyperparameter combination that has the highest mean accuracy amongst the better half in terms of standard deviation)

Best Answer

Sort of. There is the so-called "one standard error rule," which does use the standard deviation of the prediction error estimates, although not in quite the way you mentioned: instead you divide the standard deviation by the square root of the number of estimates to form the standard error of the mean estimate.

The one standard error rule says: pick the simplest model whose mean estimated prediction error is within 1 standard error of the best-performing model's estimated prediction error. In practice, the "simplest model" usually means "the most strongly regularized model." And of course, the "best-performing model" is the one with the lowest mean estimated prediction error of all models tested.

Stated a bit more plainly, the rule says that we want to pick the simplest model that still does essentially as good a job as the best-looking model -- the best-looking model could be far more complicated, despite only a marginal increase in performance.

Related Question