Solved – How to set mini-batch size in SGD in keras

gradient descentkerasneural networkspythonstochastic gradient descent

I am new to Keras and need your help.

I am training a neural net in Keras and my loss function is Squared Difference b/w net's output and target value.

I want to optimize this using Gradient Descent. After going through some links on the net, I have come to know that there are 3 types of gradient descents used generally:

  1. Single sample gradient descent: Here, the gradient is computed from only one sample every iteration –> Gradient can be noisy.
  2. Batch gradient descent: Here, the gradient is average of gradients computed from ALL the samples in dataset –> Gradient is more general, but intractable for huge datasets.
  3. Mini-batch gradient descent: Similar to Batch GD. Instead of using entire dataset, only a few of the samples (determined by batch_size) are used to compute gradient in every iteration –> Not very noisy and computationally tractable too –> Best of both worlds.

Questions:

  1. I would like to perform Mini-batch Gradient Descent in Keras. How can I do this? Should I use the SGD optimizer?
  2. If SGD is to be used, how do I set the batch_size? There doesn't seem to be a parameter to the SGD function to set batch_size.

    optimizer = keras.optimizers.SGD(lr=0.01, decay=0.1, momentum=0.1, nesterov=False)
    
  3. There is a batch_size parameter in model.fit() in Keras.

    history = model.fit(x, y, nb_epoch=num_epochs, batch_size=20, verbose=0, validation_split=0.1)
    

    Is this the same as the batch size in Mini-batch Gradient Descent? If not, what does it mean exactly to train on a batch of inputs?
    Does it mean that 'batch_size' no. of threads run parallely and update the model weights parallely?

If it helps, here's the python code snippet I have written till now.

Best Answer

Yes you are right. In Keras batch_size refers to the batch size in Mini-batch Gradient Descent. If you want to run a Batch Gradient Descent, you need to set the batch_size to the number of training samples. Your code looks perfect except that I don't understand why you store the model.fit function to an object history.