Solved – Why does dropout increase the training time per epoch in a neural network

dropoutkerasneural networks

I'm training an MLP neural network with one hidden layer and batch gradient descent using Keras/Tensorflow.
Applying dropout to the input layer increased the training time per epoch by about 25 %, independent of the dropout rate.

That dropout increases the number of epochs needed to reach a validation loss minimum is clear, but I thought that the training time per epoch would decrease by dropping out units.

Does anyone know the reason?

Best Answer

but I thought that the training time per epoch would decrease by dropping out units.

That's not the case. I understand your rationale though. You thought that zeroing out components would make for less computation. That would be the case for sparse matrices, but not for dense matrices.

TensorFlow, and any deep learning framework for that matter, uses vectorized operations on dense vector*. This means that number of zeros makes no difference, since you're going to calculate matrix operations using all entries.

In reality, the opposite is true, because dropout requires

additional matrices for dropout masks
drawing random numbers for each entry of these matrices
multiplying the masks and corresponding weights

* They also support sparse matrices, but they don't make sense for most weights because they're useful mostly if you have far less than half of entries equal to zero.

Best Answer

Related Solutions

Solved – Recurrent Neural Network Training Loss does not decrease past a certain value

Solved – Why does the loss/accuracy fluctuate during the training? (Keras, LSTM)

Related Question