Solved – Keras, how does SGD learning rate decay work

neural networkspython

If you look at the documentation http://keras.io/optimizers/ there is a parameter in the SGD for decay. I know this reduces the learning rate over time. However, I can not figure out how it works exactly. Is it a value which is multiplied by the learning rate such as lr = lr * (1 - decay) is it exponential? Also how can I see what learning rate my model is using? When I print model.optimizer.lr.get_value() after running a fit over a few epochs it gives back the original learning rate even though I set the decay.

Also do I have to set nesterov=True to use momentum or are there just two different types of momentum I can use. For instance is there a point to doing this sgd = SGD(lr = 0.1, decay = 1e-6, momentum = 0.9, nesterov = False)

Best Answer

The documentation that you're referring to includes a reference to the Python source (just click on the [Source] link in the appropriate place), that can be used to answer your questions. Here's the most relevant line, showing how decay modifies the learning rate:

lr = self.lr * (1. / (1. + self.decay * self.iterations))

The nesterov option does not have to be set to True for momentum to be used; it results in momentum being used in a different way, as again can be seen from the source:

v = self.momentum * m - lr * g  # velocity

if self.nesterov:
    new_p = p + self.momentum * v - lr * g
else:
    new_p = p + v