Solved – Early stopping together with hyperparameter tuning in neural networks

hyperparameterneural networksvalidation

Similar to this question (hyperparameter tuning in neural networks), I have a neural network with a similar list of parameters as the link above:

  • Learning rate: $[0.001, 0.01, 0.1]$
  • $L_1$ penalty: $[0.01, 0.05, 0.1, 0.5]$
  • Early stopping tolerance: $[0.0001, 0.001, 0.01]$

The paper I'm replicating didn't use dropout, but they also didn't specify exactly how they've done hyperparameter tuning. So I've reserved a portion of data for choosing learning rate and L1 penalty, but for how many epochs do I train?

This is where early stopping comes in. I can either further split my training data and use a smaller portion just for early stopping purposes. Or I can use my larger validation set for early stopping and use the validation error for when training is stopped to also choose my hyperparameters. Conceptually, I would train my model solely in the training set and choose hyperparameters using the validation set, but having training stopped-early and choose hyperparameters at the same time seem to require the supposedly "unseen" data during training. Which method should I use?

Best Answer

You can use both approaches, i.e. use the same validation split for early stopping and for hyperparameter tuning or have two validation splits, one for hyperparameter validation and one for early stopping.

The first option will give you somewhat biased results since you report the validation error of a network that has "seen" the validation data during the training (in the form of choosing the optimal early stopping point using this data). On the other hand, it is easier to implement and requires less data. If you are aware of this bias and its implications, using this approach may be acceptable.

However, the second option (two separate validation splits) is more precise and thus preferable.

Related Question