Solved – Is it ok to determine early stopping using the validation set in 10-fold cross-validation

cross-validationmachine learning

I am working on a machine learning experiment comparing the use of multiple different neural network classifiers by applying them on a large number of datasets, using stratified 10-fold cross-validation. I measure the performance as the average of the errors on the validation set (sometimes referred to as test set) of the 10-fold cross-validation procedure.

My question is, would it be ok to use this same validation set to do an early stopping of the training procedure? This early stopping would be performed by applying the trained model after each epoch to the validation set and measuring the performance, and if it declines for a number of successive learning epochs, the learning would be halted and we would take the epoch that produced the last good performance. This would be applied to all the different techniques, and across all the different datasets.

Is this ok? Or is it statistically inaccurate?

Best Answer

I am not completely clear of what the question is asking, but I think the answer is no. The thing you need to think hard about with cross-validation is that no part of your algorithm can have any access to the test set. If it does, then your cross-validation results will be tainted and not be an accurate measure of the 'true' error.

From your question, I assume you are using some kind of iterative learning algorithm such as GBM and you are using the validation set as a way of determining when your GBM has enough models in its ensemble and has started to overfit. If this is true, then what you are doing is not optimal.

The way to think of this is that the stopping criteria is part of your learning algorithm. If it is part of the algorithm, then it can't use the test set in any way.

You may need to do nested cross-validation. In your outer loop, you divide into test and training sets, then in your inner loop you further divide the training set into sub test and training sets and proceed as you have. The inner loop cross-validation can be used to learn from that training set when to stop the learning, but to get an accurate generalization error you then need to apply that to the test set from the outer loop that hasn't yet been touched by the inner loop whose aim was to find, from the training data, when the best time to stop is. To be clear, say the inner loop cross-validation found that the best number of iterations was 10. In your outer loop you learn a model using the full outer loop training set, iterating 10 times, then see how that performs on the test set.

Does this make sense?

Note that depending on the models in use and the dataset, this may or may not be a big issue. The downside is that nested cross-validation can be very computationally expensive. Doing things the way you have been may well be an appropriate trade-off between accuracy and computational time in your circumstance. The most rigid answer to your question is no, it is not completely valid cross-validation. Whether it is passable for your circumstances is a different question.

Related Question