If you look at the documentation http://keras.io/optimizers/ there is a parameter in the SGD for decay. I know this reduces the learning rate over time. However, I can not figure out how it works exactly. Is it a value which is multiplied by the learning rate such as lr = lr * (1 - decay)
is it exponential? Also how can I see what learning rate my model is using? When I print model.optimizer.lr.get_value()
after running a fit over a few epochs it gives back the original learning rate even though I set the decay.
Also do I have to set nesterov=True to use momentum or are there just two different types of momentum I can use. For instance is there a point to doing this sgd = SGD(lr = 0.1, decay = 1e-6, momentum = 0.9, nesterov = False)
Best Answer
The documentation that you're referring to includes a reference to the Python source (just click on the
[Source]
link in the appropriate place), that can be used to answer your questions. Here's the most relevant line, showing howdecay
modifies the learning rate:The
nesterov
option does not have to be set to True for momentum to be used; it results in momentum being used in a different way, as again can be seen from the source: