Solved – When the data set size is not a multiple of the mini-batch size, should the last mini-batch be smaller, or contain samples from other batches

deep learninggradient descentneural networks

When training a artificial neural network using stochastic gradient descent with mini-batches, if the data set size is not a multiple of mini-batches, should the last mini-batch contains fewer samples? Or instead is it preferable to have the last mini-batch contain the same number of samples as the other batches, by randomly adding samples from other batches (which is the strategy used here and here)?

Best Answer

Same number, otherwise you're putting more weight on the samples in the final minibatch (unless you scale down the learning weight to match the smaller size).

Adding random samples from the training set should be fine too (as long as your sampling pool includes the runt minibatch), since each sample has an equal chance of being seen twice in an epoch.

Or just do a modulo and grab samples from the beginning again.

In practice, it probably doesn't matter much.

Related Question