Solved – Validation loss increases while Training loss decrease

conv-neural-networkdeep learning

I am training a model and the accuracy increases in both the training and validation sets. I am using a pre-trained model as my dataset is very small.
I am not sure why the loss increases in the finetuning process for the validation:
enter image description here
while when training from scratch, the loss decreases similar to the training:
enter image description here

I add the accuracy plots as well here:
Fine tuning accuracy:
enter image description here

Training from scratch accuracy:
enter image description here

The model used in the pretraining did not have all the classes/nor exact patterns in the training set. Does that explain why finetuning did not enhance the accuracy and that training from scratch has a little bit enhancement compared to finetuning?

Extra Information:

I am using C3D model, which first divides one video into several "stacks" where one stack is a part of the video composed of 16 frames. I am trying to learn actions from videos.

The C3D model consists of 5 convolutional layers and 3 fully connected layers: https://arxiv.org/abs/1412.0767

Pretraining dataset: 11 classes, with 6646 videos divided into 94069 stacks
Training dataset: 18 classes (with 11 "almost similar" classes to the pretraining), and 657 videos divided into 6377 stacks

In the fine tuning, I do not freeze any layers as the videos in the training are in different places compared to the videos in the dataset used for the pretraining, and are visually different than the pretraining videos. I am trying next to train the model with few neurons in the fully connected layer.

Learning rate starts with lr = 0.005 and is decreased after step 4, 8, 12 by 10, 100, 1000 respectively in both the pretraining and the fine-tuning phases

Best Answer

This is a case of overfitting. The training loss will always tend to improve as training continues up until the model's capacity to learn has been saturated. When training loss decreases but validation loss increases your model has reached the point where it has stopped learning the general problem and started learning the data.

You said you are using a pre-trained model? Well it's likely that this pretrained model was trained with early stopping: the network parameters from the specific epoch which achieved the lowest validation loss were saved and have been provided for this pretrained model. If this is the case (which it likely is) it means any further fine-tuning will probably make the network worse at generalising to the validation set, since it has already achieved best generalisation.

The reason you don't see this behaviour of validation loss decreasing after $n$ epochs when training from scratch is likely an artefact from the optimization you have used. The training rate has decreased over time so any effects of overfitting are mitigated when training from scratch. When fine-tuning the pre-trained model the optimizer starts right at the beginning of your training rate schedule, so starts out with a high training rate causing your loss to decrease rapidly as it overfits the training data and conversely the validation loss rapidly increases.

Since you said you are fine-tuning with new training data I'd recommend trying a much lower training rate ($0.0005) and less aggressive training schedule, since the model could still learn to generalise better to your visually different new training data while retaining good generalisation properties from pre-training on its original dataset.

Related Question