In SGD an epoch would be the full presentation of the training data, and then there would be N weight updates per epoch (if there are N data examples in the training set).
If we now do mini-batches instead, say in batches of 20. Does one epoch now consist of N/20 weight updates, or is an epoch 'lengthened' by 20 so that it contains the same number of weight updates?
I ask this as in a couple of papers learning seems to be too quick for the number of epochs stated.
Best Answer
In the neural network terminology:
Example: if you have 1000 training examples, and your batch size is 500, then it will take 2 iterations to complete 1 epoch.