Machine Learning – Does Increasing Learning Rate Improve Loss When Stagnant?

deep learninggradient descentmachine learningneural networksoptimization

My model loss is quite stagnant, although there is a very mild decrease. Would it help if I increased the learning rate suddenly to get it out of the local optima? is this a valid strategy?

Also, if I got into a bad local optima initially due to too low a learning rate after decaying, does it make sense to have a lower decay rate or should I just change the learning rate to a higher one manually?

Best Answer

I think your questions are hard to answer with certainty and are questions of ongoing research. For very recent papers on this, you could look at SGDR: Stochastic Gradient Descent with Warm Restarts and Snapshot Ensembles: Train 1, Get M for Free. Both papers show that indeed increasing the learning rate during training (and then decreasing it again) can lead you to lower values of your loss function, hinting (although not clearly showing) the optimization might then find a different local minimum.

And just because you mentioned some learning rate decay rate, these papers also mention how many researchers, at least for some computer vision problems, schedule their learning rates now (which may or may not be what you are already doing): Stepwise decreases of the learning rate, so keep the learning rate constant for a certain number of epochs and then decrease it at once to a lower learning rate. And of course you can have multiple such decreases throughout your training process.