I’m a little confused. I have seen that Batch normalization leads to faster convergence and increased accuracy.
But the opposite is happening in my case. By normalizing, my accuracy actually decreased.
Is there something that I am missing?
Following is the code that I’m using
model = Sequential()
model.add(Bidirectional(LSTM(64 ,return_sequences=True),input_shape=(X_train.shape[1],X_train.shape[2])))
model.add(Conv1D(filters=16, kernel_size=3, padding=‘same’))
model.add(BatchNormalization())
model.add(Activation(‘relu’))
model.add(MaxPooling1D(pool_size=2))
model.add(Conv1D(filters=32, kernel_size=3, padding=‘same’))
model.add(BatchNormalization())
model.add(Activation(‘relu’))
model.add(MaxPooling1D(pool_size=2))
model.add(Conv1D(filters=64, kernel_size=3, padding=‘same’))
model.add(BatchNormalization())
model.add(Activation(‘relu’))
model.add(MaxPooling1D(pool_size=2))
model.add(Dropout(0.3))
model.add(Flatten())
model.add(Dense(150))
model.add(BatchNormalization())
model.add(Activation(‘relu’))
model.add(Dropout(0.4))
model.add(Dense(10))
model.add(BatchNormalization())
model.add(Activation(‘relu’))
model.add(Dropout(0.4))
model.add(Dense(dummy_y.shape[1],activation = ‘softmax’))
model.compile(loss=‘categorical_crossentropy’, optimizer=‘adam’, metrics=[‘categorical_accuracy’])
model.summary()
Is it the right place for batch normalization or I’ve done something wrong ?
Best Answer
Try putting your Batch Normalization layer AFTER activation. Because what it's doing right now is effectively killing off half of your gradient on each layer - you normalize to 0 mean, which means only half of your ReLUs are firing, and you get vanishing gradient.