Solved – What are some useful data augmentation techniques for deep convolutional neural networks

deep learningmachine learning

Background:
I recently understood on a deeper level the importance of data augmentation when training convolutional neural networks after seeing this excellent talk by Geoffrey Hinton.

He explains that current generation convolutional neural networks are not able to generalize the frame of reference of the object under test, making it hard for a network to truly understand that mirrored images of an object are the same.

Some research has gone into trying to remedy this. Here is one of the many many examples.
I think this helps to establish how critical data augmentation is today when training convolutional neural networks.

Data augmentation techniques are rarely benchmarked against each other. Hence:

Questions:

What are some papers where the practitioners reported exceptionally better performance?
What are some data augmentation techniques that you have found useful?

Best Answer

Sec. 1: Data Augmentation Since deep networks need to be trained on a huge number of training images to achieve satisfactory performance, if the original image data set contains limited training images, it is better to do data augmentation to boost the performance. Also, data augmentation becomes the thing must to do when training a deep network.

There are many ways to do data augmentation, such as the popular horizontally flipping, random crops and color jittering. Moreover,
you could try combinations of multiple different processing, e.g.,
doing the rotation and random scaling at the same time. In addition,
you can try to raise saturation and value (S and V components of the
HSV color space) of all pixels to a power between 0.25 and 4 (same
for all pixels within a patch), multiply these values by a factor
between 0.7 and 1.4, and add to them a value between -0.1 and 0.1.
Also, you could add a value between [-0.1, 0.1] to the hue (H
component of HSV) of all pixels in the image/patch.

Krizhevsky et al. 1 proposed fancy PCA when training the famous Alex-Net in 2012. Fancy PCA alters the intensities of the RGB
channels in training images. In practice, you can firstly perform PCA on the set of RGB pixel values throughout your training images. And
then, for each training image, just add the following quantity to
each RGB image pixel (i.e., I_{xy}=[I_{xy}^R,I_{xy}^G,I_{xy}^B]^T):
[bf{p}_1,bf{p}_2,bf{p}_3][alpha_1 lambda_1,alpha_2 lambda_2,alpha_3
lambda_3]^T where, bf{p}_i and lambda_i are the i-th eigenvector and
eigenvalue of the 3times 3 covariance matrix of RGB pixel values,
respectively, and alpha_i is a random variable drawn from a Gaussian
with mean zero and standard deviation 0.1. Please note that, each
alpha_i is drawn only once for all the pixels of a particular
training image until that image is used for training again. That is
to say, when the model meets the same training image again, it will
randomly produce another alpha_i for data augmentation. In 1, they
claimed that “fancy PCA could approximately capture an important
property of natural images, namely, that object identity is invariant to changes in the intensity and color of the illumination”. To the
classification performance, this scheme reduced the top-1 error rate
by over 1% in the competition of ImageNet 2012.

(Source: Must Know Tips/Tricks in Deep Neural Networks (by Xiu-Shen Wei))

Best Answer

Related Solutions

CNNs – What the Convolution Step Does in a Convolutional Neural Network

Solved – Convolutional neural networks: Aren’t the central neurons over-represented in the output

Related Question