Solved – full batch vs online learning vs mini batch

machine learning

This is a question from a coursera course:

Suppose we have a set of examples and Brian comes in and duplicates every example, then randomly reorders the examples. We now have twice as many examples, but no more information about the problem than we had before. If we do not remove the duplicate entries, which one of the following methods will not be affected by this change, in terms of the computer time (time in seconds, for example) it takes to come close to convergence?

a) full-batch learning
b) online-learning where for every iteration we randomly pick a training case
c) mini-batch learning where for every iteration we randomly pick 100 training cases

The answer is b. But I wonder why c is wrong. Isn't online-learning a special case of mini-batch where each iteration contains only a single training case?

Best Answer

Isn't online-learning a special case of mini-batch where each iteration contains only a single training case?

This is true, but somewhat irrelevant (since the question is specifically comparing full batch to batch size 1 to batch size 100).

(b) will be absolutely unaffected by the change (modulo memory usage and cache efficiency issues), since each step costs the same as it would have before and is identical. (Well, depending on the formulation, a regularization constant might be effectively halved as well.)

(c) is affected because when we choose a batch of size 100, we might choose some points twice, overweighting those points and removing the other useful information that may have been in their place. Thus we have a slightly worse estimate of the training data distribution, and so will probably be slightly less effective in learning the model.