Solved – Keras, how does SGD learning rate decay work

neural networkspython

If you look at the documentation http://keras.io/optimizers/ there is a parameter in the SGD for decay. I know this reduces the learning rate over time. However, I can not figure out how it works exactly. Is it a value which is multiplied by the learning rate such as lr = lr * (1 - decay) is it exponential? Also how can I see what learning rate my model is using? When I print model.optimizer.lr.get_value() after running a fit over a few epochs it gives back the original learning rate even though I set the decay.

Also do I have to set nesterov=True to use momentum or are there just two different types of momentum I can use. For instance is there a point to doing this sgd = SGD(lr = 0.1, decay = 1e-6, momentum = 0.9, nesterov = False)

Best Answer

The documentation that you're referring to includes a reference to the Python source (just click on the [Source] link in the appropriate place), that can be used to answer your questions. Here's the most relevant line, showing how decay modifies the learning rate:

lr = self.lr * (1. / (1. + self.decay * self.iterations))

The nesterov option does not have to be set to True for momentum to be used; it results in momentum being used in a different way, as again can be seen from the source:

v = self.momentum * m - lr * g  # velocity

if self.nesterov:
    new_p = p + self.momentum * v - lr * g
else:
    new_p = p + v

Related Solutions

Solved – Struggling to train a MLP using Keras (Python)

If you want to model a sinusoid, I think that a stateful LSTM (RNN) might be a more natural choice. You can find an excellent example of modelling a sinusoid with an exponential amplitude decay in the keras example.

However, I tried out your Keras code, and I think your problem is that you're not letting it train long enough. Look at your loss at epoch 250, its VERY high!!

Epoch 250/250
360/360 [==============================] - 0s - loss: 0.5291 - val_loss: 0.7775

If I changed the number of nodes in your hidden layer to 10 and let it run for 15000 epochs instead of 250, I found that the loss was considerably lower and the plot more what you expect.

Epoch 15000/15000
360/360 [==============================] - 0s - loss: 0.2434 - val_loss: 0.2638

The updated code looks like:

# Multilayer Perceptron
model = Sequential()    # Feedforward
model.add(Dense(10, input_dim=1))
model.add(Activation('tanh'))
model.add(Dense(1))
model.compile('sgd', 'mse')

hist = model.fit(xtr, ttr, validation_split=0.1, nb_epoch=15000)

Solved – Reference to learn how to interpret learning curves of deep convolutional neural networks

2 things:

You should probably switch your 50/50 train/validation repartition to something like 80% training and 20% validation. In most cases it will improve the classifier performance overall (more training data = better performance)
If you have never heard about "early-stopping" you should look it up, it's an important concept in the neural network domain : https://en.wikipedia.org/wiki/Early_stopping . To summarize, the idea behind early-stopping is to stop the training once the validation loss starts plateauing. Indeed, when this happens it almost always mean you are starting to overfit your classifier. The training loss value in itself is not something you should trust, because it will continue to decrease even when you are overfitting your classifier.

I hope I was clear enough, good luck in your work :)

Best Answer

Related Solutions

Solved – Struggling to train a MLP using Keras (Python)

Solved – Reference to learn how to interpret learning curves of deep convolutional neural networks

Related Question