Solved – How to set mini-batch size in SGD in keras

gradient descentkerasneural networkspythonstochastic gradient descent

I am new to Keras and need your help.

I am training a neural net in Keras and my loss function is Squared Difference b/w net's output and target value.

I want to optimize this using Gradient Descent. After going through some links on the net, I have come to know that there are 3 types of gradient descents used generally:

Single sample gradient descent: Here, the gradient is computed from only one sample every iteration –> Gradient can be noisy.
Batch gradient descent: Here, the gradient is average of gradients computed from ALL the samples in dataset –> Gradient is more general, but intractable for huge datasets.
Mini-batch gradient descent: Similar to Batch GD. Instead of using entire dataset, only a few of the samples (determined by batch_size) are used to compute gradient in every iteration –> Not very noisy and computationally tractable too –> Best of both worlds.

Questions:

I would like to perform Mini-batch Gradient Descent in Keras. How can I do this? Should I use the SGD optimizer?
If SGD is to be used, how do I set the batch_size? There doesn't seem to be a parameter to the SGD function to set batch_size.
```
optimizer = keras.optimizers.SGD(lr=0.01, decay=0.1, momentum=0.1, nesterov=False)
```
There is a batch_size parameter in model.fit() in Keras.
```
history = model.fit(x, y, nb_epoch=num_epochs, batch_size=20, verbose=0, validation_split=0.1)
```
Is this the same as the batch size in Mini-batch Gradient Descent? If not, what does it mean exactly to train on a batch of inputs?
Does it mean that 'batch_size' no. of threads run parallely and update the model weights parallely?

If it helps, here's the python code snippet I have written till now.

Best Answer

Yes you are right. In Keras batch_size refers to the batch size in Mini-batch Gradient Descent. If you want to run a Batch Gradient Descent, you need to set the batch_size to the number of training samples. Your code looks perfect except that I don't understand why you store the model.fit function to an object history.

Best Answer

Related Solutions

Solved – How to it be trapped in a saddle point

Solved – Stochastic Gradient Descent, Mini-Batch and Batch Gradient Descent

Related Question