Solved – Does online data augmentation make sense

data augmentationgradient descentneural networks

Data augmentation is popularly done online as that is how it is typically implemented and suggested in neural network frameworks like Keras and TensorFlow. I have also seen it described in e.g. the AlexNet paper.

Online data augmentation implies that the network will see a completely different dataset every epoch. Superficially this seems like a cool idea (especially because it is computationally free if you augment the next batch on the CPU and train the previous batch on the GPU) with some people saying that it improves generalization, but other than that I did not find quality literature supporting and explaining this idea.

My issue with online data augmentation is that I thought gradient-based learning algorithms were fundamentally based on repetition (i.e. seeing the same dataset every epoch) which would make sense intuitively. Is this actually the case though or did I make it up? Is there any literature covering this?

Best Answer

From an optimization standpoint, repetition is nice (we want to optimize the same function). From a modeling standpoint, repetition can risk memorizing the training data without learning anything generalizable. For image data, online augmentation is motivated by observing that we can translate or add noise or otherwise distort an image but retain its key semantic content (so a human can still recognize it). The hypothesis of online augmentation is that the model probably won't see the exact same image twice, so memorization is unlikely, so the model will generalize well.

Related Question