Solved – How to implement momentum in mini-batch gradient descent

gradient descentmachine learningneural networks

I understand the idea behind momentum, and how to implement it with batch gradient descent, but I'm not sure how to implement it with mini-batch gradient descent. As I understand it, implementing momentum in batch gradient descent goes like this:

for example in training_set:
    calculate gradient for this example
    accumulate the gradient
for w, g in weights, gradients:
    w = w - learning_rate * g + momentum * gradients_at[-1]

Where gradients_at records the gradients for each weight at backprop iteration t.

Is this correct? If so, what modifications are necessary to apply this technique in mini-batch gradient descent?

Best Answer

The only difference between a batch and a mini-batch is that you're using part of the data set rather than the entire dataset during each epoch. Thus, you would calculate the gradient for only a subset of the samples in your training set and use these during each update epoch. Repeat for many epochs, where each epoch contains a different subset of the the full dataset.