Solved – Early stopping for CNN to improve speed of training

convolutiondeep learningneural networksregularization

I want to implement early stopping for my convolutional neural network. The main reason is that I want to test my CNN using various parameter settings and some of these may require more iterations than others to train. I therefore want to set a quite high limit on the number of iterations while using an early stopping criteria to avoid spending time training after the performance has converged. The early stopping criteria would also help avoid overfitting, but that is just a by-product as my CNN doesn't really overfit that much due to dropout etc.

When looking into the literature I haven't been able to find any information on using early stopping for optimizing training time rather than generalization performance. Furthermore, I have only been able to find information on ANNs with a single hidden layer (such as http://page.mi.fu-berlin.de/prechelt/Biblio/stop_tricks1997.pdf), whereas my CNN has 4 convolutional layers and a few other layers, which may likely need a different early stopping criteria.

I have split my data into the following 3 sets:

  • Train = 60%
  • Validation = 20%
  • Test = 20%

I want to use the validation set to find when to stop training before I will use the test set for finding the generalisation performance. However, it is unclear to me what metric to compute on the validation test set to find when to stop. Can anyone recommend an early stopping criteria for a 4-layer CNN that is simple to implement?

EDIT: Specified problem further. Sorry for any misunderstandings.

Best Answer

You can use cross-validation to trigger early stopping. Basically it involves splitting the data in a train an validation set. After a set of iterations using the training data you can check if the extra iterations resulted in better performance on the validation set. If so, continue training, if not, consider stopping.

If you start with very small parameters, this can provide a form of regularization.

In practice this criterion is not always smooth; the performance on the validation set tends to go up and down. For more in depth information see: http://page.mi.fu-berlin.de/prechelt/Biblio/stop_tricks1997.pdf

Related Question