Data Augmentation – Does Data Augmentation with White Noise Improve Accuracy of Deep Learning Models?

data augmentationmachine learningneural networks

I was reading Aurélien Géron's Hands-on Machine Learning with Scikit-Learn, Keras & TensorFlow. There, on the 14th chapter I read something on data augmentation which I could not be sure of its validity. The author was saying that the images for the data augmentation should be generated in a way that humans should have the difficulty whether the images are generated or not. He further continues that the modifications should be learnable, therefore should not be white noise, as he says white noise is not learnable.

I could not really be sure of this, because I know that there are jittering methods out there to be used in the data augmentation step. So, do they really not work? Is adding noise in the data really meaningless, or is there something that I am missing?

Can you also demonstrate some cases with examples?

Best Answer

I have to say that I disagree with it. Adding noise to the data aims to improve the generalization performance (let's not constrain this to accuracy) because the added noise (can be white as well) makes the life harder for the learning algorithm to memorize the training examples. So, the aim is not to learn the added noise at all, which is why it may even be better that the added noise is white and not learnable. Some models (like simple linear regression) even explicitly assumes that the data samples have added Gaussian noise. There are a lot of papers previously experimented on this idea, the following link provides a good set of starting references I suppose.

Related Question