Solved – Should training data in each batch size be resample only one time or at each epoch when using mini-batch

deep learningneural networkssampling

I saw some related question regarding to the fact is one should use sampling with resampling or without when using minibatch.

However my question is different.

Let's assume that I use sampling witouth replacement. Meaning that for each epoch one training example can only belong one 'batch size' group.

I want to know if one should only resample the training data (into the different batch size) only 1 time or before to start to update the parameters before each epoch.

For instance suppose I have 20 'batch size' group. Suppose at the first iteration I have sampled picture A in group 2 and picture B in group 4.

If so should picture A always stay in group 2 and picture B in group 4. Or should I resample it before to start a new epoch (e.g. at epoch 2 put picture B in group 6 and picture B in group 20).

Best Answer

The typical usage of SGD is that each minibatch is constructed completely at random without replacement. (There are other ways to construct minibatches; for some comparison of alternatives, see Why do neural network researchers care about epochs?)

Suppose that you have 6 samples: $S=\{A, B, C, D, E, F\}$ and minibatch size 2. The first minibatch could be $\{A, C\}$ and the second $\{D, F\}$ and the third $\{B, E\}$. So what's happening is that at the first minibatch is that you're sampling 2 examples without replacement from the set $S$. At the second minibatch, you're sampling 2 examples without replacement from $S\setminus\{A,C\}=\{B,D,E,F\}$. At the third minibatch, you're sampling 2 examples without replacement from $S\setminus\{A,C,D,F\}=\{B,E\}$. At this final step, there is only a single possible minibatch because you have a minibatch size of 2, you're sampling without replacement, and only 2 examples remain.

Now you've exhausted all of your training samples, so the next epoch starts. Epochs are constructed completely at random, so a valid sequence of minibatches is first $\{D, F\}$, second $\{A, B\}$ and third $\{C,E\}$. It's ok that one of the minibatches with $\{D,F\}$ appear in both epoch 1 and epoch 2 -- this happened purely due to randomness. (And in a more realistic usage where more training data is available, the less likely it is that consecutive epochs will contain one or more mini-batches that are identical, unless the mini-batch size is 1.)