Neural Networks – Is Epoch Optimization Possible with Constant Mini-Batch Size in CV?

cross-validationhyperparameterneural networks

Assume that you found the optimal hyperparameters of a neural network (e.g. a multi layer feed forward NN) with k-fold cross validation in a grid search. Lets assume you have varied the number of epochs and got the optimal epoch number as a result.

Now you want to train the network for the optimal number of epochs on the whole data used for CV. In my opinion following problem arises:
When using a constant mini batch size, number of iterations varies for datasets of different size – so your optimal epoch number found for the smaller training split in the CV can not be applied to training with the whole CV data.
Am I missing something here? Right at the moment I can not find a proper solution for finding the optimal training epochs for my NN. I searched the net for some time now, but could not find an answer.

P.S. I only described the problem after the inner loop of nested CV, but the same problem arises after the outer loop (training the model on ALL data, not only on CV data for putting the model to production).

Best Answer

The "optimal" number of epochs for a neural network training is not very reliable (i.e. has high variance), so generally I would recommend against it. The benefit you get from using a few more samples is usually less than the benefit a validation set gives you (early stopping). See this answer for more details.

Otherwise, you are correct. If you change the size of the dataset while keeping the batch size equal, an "epoch" corresponds to a different number of weight updates, which are the real measure of training length. You would have to work with the update count rather than the epoch count. See this answer for details.