Solved – overtrain the CNN

conv-neural-networkdeep learningmachine learningneural networksoverfitting

This may sound silly: will the CNN overtrain? From what I have seen so far, having many epochs and parameters are the key of success of CNN. And there is almost nothing I found is about overtrain a CNN.

However, I have encountered some issue with my dataset. I keep training for a long epochs and the training accuracy is increasing. However, my validation set accuracy is large at epoch = 1 and then start to decrease (i ran for about 30 epochs).

Is this the problem of my dataset? Or I did something wrong? Or I overtrain the CNN? So less than 1 epoch is enough?

Best Answer

CNN, like any other neural network, overfits to the training data if it is trained for too long on the same training dataset. The purpose of the validation set is to stop training when performance on validation set starts decreasing, indicating that the model is overfitting the training data. Check this for more info.

Related Solutions

Solved – Oscillating validation accuracy for a convolutional neural network

This is likely due to the ordering of your dataset. If there's many observations of the same class in a sequence the weights of the network will move too far in the direction of classifying this class.

A common cause is if you balance the classes in your dataset by resampling observations and appending them to the dataset. Shuffle your dataset - that should help you avoid the fluctuations in accuracy (and perhaps obtain a higher accuracy overall).

Neural Networks – Why Validation Loss Increases While Validation Accuracy Increases

Other answers explain well how accuracy and loss are not necessarily exactly (inversely) correlated, as loss measures a difference between raw prediction (float) and class (0 or 1), while accuracy measures the difference between thresholded prediction (0 or 1) and class. So if raw predictions change, loss changes but accuracy is more "resilient" as predictions need to go over/under a threshold to actually change accuracy.

However, accuracy and loss intuitively seem to be somewhat (inversely) correlated, as better predictions should lead to lower loss and higher accuracy, and the case of higher loss and higher accuracy shown by OP is surprising. I have myself encountered this case several times, and I present here my conclusions based on the analysis I had conducted at the time. There may be other reasons for OP's case.

Let's consider the case of binary classification, where the task is to predict whether an image is a cat or a horse, and the output of the network is a sigmoid (outputting a float between 0 and 1), where we train the network to output 1 if the image is one of a cat and 0 otherwise. I believe that in this case, two phenomenons are happening at the same time.

Some images with borderline predictions get predicted better and so their output class changes (eg a cat image whose prediction was 0.4 becomes 0.6). This is the classic "loss decreases while accuracy increases" behavior that we expect.
Some images with very bad predictions keep getting worse (eg a cat image whose prediction was 0.2 becomes 0.1). This leads to a less classic "loss increases while accuracy stays the same". Note that when one uses cross-entropy loss for classification as it is usually done, bad predictions are penalized much more strongly than good predictions are rewarded. For a cat image, the loss is $log(1-prediction)$, so even if many cat images are correctly predicted (low loss), a single misclassified cat image will have a high loss, hence "blowing up" your mean loss. See this answer for further illustration of this phenomenon. (Getting increasing loss and stable accuracy could also be caused by good predictions being classified a little worse, but I find it less likely because of this loss "asymmetry").

So I think that when both accuracy and loss are increasing, the network is starting to overfit, and both phenomena are happening at the same time. The network is starting to learn patterns only relevant for the training set and not great for generalization, leading to phenomenon 2, some images from the validation set get predicted really wrong, with an effect amplified by the "loss asymmetry". However, it is at the same time still learning some patterns which are useful for generalization (phenomenon one, "good learning") as more and more images are being correctly classified.

I sadly have no answer for whether or not this "overfitting" is a bad thing in this case: should we stop the learning once the network is starting to learn spurious patterns, even though it's continuing to learn useful ones along the way?

Finally, I think this effect can be further obscured in the case of multi-class classification, where the network at a given epoch might be severely overfit on some classes but still learning on others.

Best Answer

Related Solutions

Solved – Oscillating validation accuracy for a convolutional neural network

Neural Networks – Why Validation Loss Increases While Validation Accuracy Increases

Related Question