Solved – Value of the keep probability when calculating loss with dropout

classificationdropoutneural networks

I'm training a small neural network (2 hidden layers) to classify the mnist images, and want to apply dropout regularization before my output layer.

My first question: is it worth applying dropout to such a small network? My network is doing relatively well (around 98% accuracy consistently) without dropout.

My second question: If I do apply dropout, and would like to periodically check on the value of my loss function, should I measure my loss without the dropout (i.e. use a keep probability of 1.0)? I am using a keep probability of 0.2 for each training step.

Best Answer

1) It's worth it to use dropout if you have a lot of neurons which are fully connected. So even if you have 2 layers, if you also have say, 100 neurons in each layer, then it's worth a try. Even with a single layer, it's still worth a try since it's a perfectly reasonable form of regularization.

2) Dropout zeros out neurons and has nothing to do with the loss function, which is based on the final output of the network. Instead, you should only be concerned on evaluating the network on your test or validation set, where you need to weigh each neuron's contribution with the dropout probability. This is generally automatically done in most neural network packages and implementations. Otherwise on the training set, the output after a single step of dropout is determined based on which neurons were kept during that step along with which batch of data was used.

Best Answer

Related Solutions

Solved – Alternatives to L1, L2 and Dropout generalization

Solved – Dropout effectiveness on small neural networks

Related Question