K-Fold Cross Validation – How Many Epochs to Train For?

cross-validationneural networks

I am doing k-fold cross validation across my training set with the goal of finding the best structure for a neural network.

Within each fold, should I
A) train the network for a constant number of epochs? OR
B) train each fold until the error on the current fold starts increasing?

If I do B) then each parameter set will be trained for different number of epochs. This gives me an additional hyper parameter (number of epochs) which I could use but I am planning on using an additional holdout set to test the performance. Should I then just ignore the number of epochs that the cross validation found and train on the entire training set until the error on the holdout set increases ??

Best Answer

You could do A but it is not recommended. The steps most often employed are described in pg 245 of the text (pg 264/764 in the pdf)

https://web.stanford.edu/~hastie/Papers/ESLII.pdf

An important caveat: The book recommends Often a “one-standard error” rule is used with cross-validation, in which we choose the most parsimonious model whose error is no more than one standard error above the error of the best model.. I have seen some papers where they chose the minimum error model also. There is nothing right or wrong in either method. These are typically caveats based on heuristics. The one std. error rule they describe is motivated by the bias-variance tradeoff.

The number of iterations is typically not relevant; the convergence criteria is. Is there a reason you care about the number of iterations?