Solved – What are some useful data augmentation techniques for deep convolutional neural networks

deep learningmachine learning

Background:
I recently understood on a deeper level the importance of data augmentation when training convolutional neural networks after seeing this excellent talk by Geoffrey Hinton.

He explains that current generation convolutional neural networks are not able to generalize the frame of reference of the object under test, making it hard for a network to truly understand that mirrored images of an object are the same.

Some research has gone into trying to remedy this. Here is one of the many many examples.
I think this helps to establish how critical data augmentation is today when training convolutional neural networks.

Data augmentation techniques are rarely benchmarked against each other. Hence:

Questions:

  • What are some papers where the practitioners reported exceptionally better performance?

  • What are some data augmentation techniques that you have found useful?

Best Answer

Sec. 1: Data Augmentation Since deep networks need to be trained on a huge number of training images to achieve satisfactory performance, if the original image data set contains limited training images, it is better to do data augmentation to boost the performance. Also, data augmentation becomes the thing must to do when training a deep network.

  • There are many ways to do data augmentation, such as the popular horizontally flipping, random crops and color jittering. Moreover,
    you could try combinations of multiple different processing, e.g.,
    doing the rotation and random scaling at the same time. In addition,
    you can try to raise saturation and value (S and V components of the
    HSV color space) of all pixels to a power between 0.25 and 4 (same
    for all pixels within a patch), multiply these values by a factor
    between 0.7 and 1.4, and add to them a value between -0.1 and 0.1.
    Also, you could add a value between [-0.1, 0.1] to the hue (H
    component of HSV) of all pixels in the image/patch.

  • Krizhevsky et al. 1 proposed fancy PCA when training the famous Alex-Net in 2012. Fancy PCA alters the intensities of the RGB
    channels in training images. In practice, you can firstly perform PCA on the set of RGB pixel values throughout your training images. And
    then, for each training image, just add the following quantity to
    each RGB image pixel (i.e., I_{xy}=[I_{xy}^R,I_{xy}^G,I_{xy}^B]^T):
    [bf{p}_1,bf{p}_2,bf{p}_3][alpha_1 lambda_1,alpha_2 lambda_2,alpha_3
    lambda_3]^T where, bf{p}_i and lambda_i are the i-th eigenvector and
    eigenvalue of the 3times 3 covariance matrix of RGB pixel values,
    respectively, and alpha_i is a random variable drawn from a Gaussian
    with mean zero and standard deviation 0.1. Please note that, each
    alpha_i is drawn only once for all the pixels of a particular
    training image until that image is used for training again. That is
    to say, when the model meets the same training image again, it will
    randomly produce another alpha_i for data augmentation. In 1, they
    claimed that “fancy PCA could approximately capture an important
    property of natural images, namely, that object identity is invariant to changes in the intensity and color of the illumination”. To the
    classification performance, this scheme reduced the top-1 error rate
    by over 1% in the competition of ImageNet 2012.

(Source: Must Know Tips/Tricks in Deep Neural Networks (by Xiu-Shen Wei))