Is it possible to learn with batch size = 1

gradient descentmachine learningneural networksoptimization

Due to OOM error, I can only set the batch size to be 2 or 1.

Is it possible to learn with such a low batch size?

Thanks!

Best Answer

Yes. Before mini-batching existed, SGD referred specifically to batch size equal to one.

You can actually use a bigger batch size though, you just need to add gradients from sequential samples within a batch. This is called Gradient Accumulation. See link.