Solved – Does Keras SGD optimizer implement batch, mini-batch, or stochastic gradient descent

kerasneural networksstochastic gradient descent

I am a newbie in Deep Learning libraries and thus decided to go with Keras. While implementing a NN model, I saw the batch_size parameter in model.fit().

Now, I was wondering if I use the SGD optimizer, and then set the batch_size = 1, m and b, where m = no. of training examples and 1 < b < m, then I would be actually implementing Stochastic, Batch and Mini-Batch Gradient Descent respectively. However, on the other hand, I felt using SGD as the optimizer would by default ignore the batch_size parameter, since SGD stands for Stochastic Gradient Descent and it should always use a batch_size of 1 (i.e use a single data point for each iteration of gradient descent).

I would be grateful if someone could clarify as to which of the above two cases is true.

Best Answer

It works just as you suggest. batch_size parameter does exactly what you would expect: it sets the size of the batch:

  • batch_size: Integer or None. Number of samples per gradient update. If unspecified, batch_size will default to 32.

From programming point of view, Keras decouples the weight update formula parameters specific to each optimizer (learning rate, momentum, etc.) from the global training properties (batch size, training length, etc.) that are share between methods. It is matter of convenience—there is no point in having optimizers SGD, MBGD, BGD that all do the same thing just with different batch size.