Solved – how does Keras ImageDataGenerator standardize data

kerasneural networksstandardization

If I understand correctly the ImageDataGenerator class is a generator and returns batches of images when called, but what I don't seem to understand is:

  • featurewise_center
  • featurewise_std_normalization

The documentation says:

  • featurewise_center: Boolean. Set input mean to 0 over the dataset, feature-wise.
  • featurewise_std_normalization: Boolean. Divide inputs by std of the dataset, feature-wise.

How can you set mean to 0 over entire dataset when you have only a batch of images at any given time? and same for normalization.

Best Answer

You are right, normally you would not be able to tell these from a single batch of loaded samples.

This is why before you use ImageDataGenerator for generating batches, you need to fit it to your data, to calculate the statistics necessary for normalization.

Here is an example from Keras documentation

(x_train, y_train), (x_test, y_test) = cifar10.load_data()
y_train = np_utils.to_categorical(y_train, num_classes)
y_test = np_utils.to_categorical(y_test, num_classes)

datagen = ImageDataGenerator(
    featurewise_center=True,
    featurewise_std_normalization=True,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    horizontal_flip=True)

# compute quantities required for featurewise normalization
# (std, mean, and principal components if ZCA whitening is applied)
datagen.fit(x_train)

# fits the model on batches with real-time data augmentation:
model.fit_generator(datagen.flow(x_train, y_train, batch_size=32),
                    steps_per_epoch=len(x_train) / 32, epochs=epochs)
Related Question