Solved – Using constant versus changing random seeds for cross validation hyperparam optimisation

cross-validationhyperparameteroptimization

If I understand things correctly, in a nested cross validation the inner cross validation is for optimising over the search space of hyperparams, and the outer loop is validating the accuracy the optimal hyperparams determined by the inner loop. i.e.

  • outer cross validation
    • hyperparameter search
      • inner cross validation

We then have the choice of either reusing the same inner cross validation splits (i.e. the same random seed) or we can randomise (i.e. change the random seed) for each inner cross validation of every hyperparam vector candidate we investigate.

On the one hand, I can see that by keeping the seed the same we are only changing one variable (namely the chosen vector of hyperparam candidates), which makes the hyperparam optimisation easier. i.e. if both the hyperparams and data are changing, optimisation is having to optimise over more free variables.

On the other hand, if we randomise the folds for each hyperparam vector candidate there is less chance we will find a local minima / maxima due to the chance of an "unlucky" single choice of inner cross validation split producing a model that is optimal for that single split, but not for other possible splits.

How does the choice of randomise versus not randomise for the inner cross validation affect the hyperparameter vector search optimisation?

I suspect that the answer is very dependent on the size of the hyperparam search space (i.e. the difficulty of optimising the hyperparams) versus the distribution of the data (i.e. the probability of choosing a really "bad" split and wrongly concluding we've found the best vector of hyperparams).

Best Answer

If you search for a zillion hyperparameter combinations, you will start to overfit whatever you're testing those against.

Therefore, I'd be tempted to take one single train/validate split, do your hyperparameter search on that. And then, and only then, evaluate it against some other split or fold or test/validation data set.

Using random splits for each set of parameters sounds like something to avoid because:

  • as you say, for some hyperparameter options, you'll 'get lucky', jsut because of the train/validate split giving a randomly high score
  • you're basically overfitting to every possible train/validate split of your data, leaving you no novel splits to validate against