Solved – Training loss decreases, then suddenly increases, then decreases lower than the first time

optimizationtraining error

I get the following loss behavior when training multilayer perceptron with mean squared error loss on some synthetic data using Adam with learning rate 1e-1.

As far as I can say from reading, for example, Training loss increases with time, the increase can be attributed to the learning rate being too large such that optimizer pushes the model outside the minimum.
However, what I do not understand, is that the second minimum (around 450th epoch) is significantly lower than the 1st minimum (at ~200th epoch).

Could you please point me where should I read about such behavior and why it occurs?
Thank you!

Best Answer

Here's one possible interpretation of your loss function's behavior:

At the beginning, loss decreases healthily.
Optimizer accidentaly pushes the network out of the minimum (you identified this too). Loss function is now high.
Loss decreases healthily again, but towards a different local minimum which might actually be lower than the previous. Remember the optimizer's path is random, so it can go down a different path this time.
Back to step 2 and the cycle repeats itself.

I think the key observation here is that the loss function of almost every neural net (or perceptron) has several minima, and we're usually happy if our optimizer finds one that is low enough. The following link explains the concept well. https://www.allaboutcircuits.com/technical-articles/understanding-local-minima-in-neural-network-training/

Best Answer

Related Solutions

Solved – Why is it important to include a bias correction term for the Adam optimizer for Deep Learning

Solved – No change in accuracy using Adam Optimizer when SGD works fine

Related Question