Solved – Classification accuracy increasing while overfitting

overfittingvalidation

I'm training a classification model, and these are the plots for accuracy and loss history.

Accuracy plot
Loss plot

Besides the fact that the learning rate is too large, what I understand is that the model start overfitting at around epoch 1000 (you can see a round dot in the loss plot which indicates the minimum loss computed for validation throughout all training), however validation accuracy keeps increasing, though slowly.

At first I thought that when reshuffling each split's samples I was mistakenly mixing training and validation samples, but that does not seem the case.

Is something wrong going on here? Or does reducing training error somehow reflects on validation accuracy, even when overfitting?

UPDATE: Apparently, a few samples in the validation set are quite similar to samples in the train set. What I've found is that as training goes on, the model learns to recognize those samples, while the loss it incurs into when it makes mistakes becomes very large, thus causing the average loss to keep increasing.

Best Answer

I think this reflects the nature of your data whereby you can get increasing accuracy at the expense of the more useful measure of loss.

By way of explanation imagine that your data is the stock market and you want to classify up or down days for the purpose of investing. In this scenario not all days are of equal value - a relatively few days will make or break your investing career - and increasing the classification accuracy of the days that have have little or no movement is irrelevant to your desired outcome of making a profit. It would be much better to have pretty poor accuracy in aggregate but actually be highly accurate in predicting the few days when the market makes monster moves.