Solved – Overfitting during epochs

deep learningoverfittingvalidation

Especially in deep learning, the validation accuracy of a model usually not only plateaus but decreases when it starts overfitting. Because we only measure the validation accuracy after each epoch, wouldn't it be possible for a model to start overfitting during an epoch?

From my understanding, this could be a significant problem if there is a lot of similar data available but I haven't found any material about it and have never seen such an approach in practice. Is there a reason besides computational cost that we only ever validate our models after an epoch? And would it make sense to check validation accuracy every n batches rather than epochs if you are trying to get the last bit of performance out of a model (e. g. for a Kaggle competition)?

Best Answer

There is no particular theoretical reason behind evaluating after each epoch.

"Epoch" is often used as a somewhat magical term, but there is really nothing magical there. Epoch simply means $\frac{N}{K}$ mini-batches, where $N$ is dataset size and $K$ is the mini-batch size. When the data augmentation is employed (usually the case with deep nets), the argument that "during an epoch every training sample is used" becomes invalid since new samples are generated every time. So if you want to get every little bit of your model, evaluate after every batch (if you have the time)¹.

Usually, deep nets don't overfit so fast (even though in theory, the scenario you described could happen), so evaluation at the end of every epoch is simply convenient.

In practice, you could save weights' snapshots after every epoch and if you notice the overfitting starts within some particular epoch, you can just start from the last snapshot before it and this time evaluate after every batch.

¹ Note that in this case you start simply overfitting to the validation set. There may be even inferior effect on the test performance.

Best Answer

Related Solutions

Solved – Preventing overfitting of LSTM on small dataset

Deep Learning – How to Use Early Stopping for Training Neural Networks

Related Question