MLP Training – Early-Stopping Without Dropout Layers

dropoutneural networkstensorflow

I am training a multi-layer perceptron (MLP) with 4 hidden layers. I got the best hyper-parameters by the following steps using HParams:

  1. Training model by each combination of parameters, such as
    {'dropout_rate_of_l1': 0.1, 'dropout_rate_of_l2': 0.6, 'dropout_rate_of_l3': 0.3, 'dropout_rate_of_l4': 0}, there are
    about 3500 different combinations in total;
  2. 20% samples in training set were used as validation set in this process;
  3. Training 300 steps for each parameter combination and save the best model that had the lowest error on validation set;
  4. Getting the best parameter combination that has the lowest error over all parameter combinations as the final hyper-parameters.

Finally, I got [0, 0, 0, 0] of dropout rates for 4 dropout layers. Then I used early-stopping with patients equal to 20 to train my model. The details can be found: https://tensorboard.dev/experiment/0kGL4vOuRpamHzALyGna1Q/#hparams

My question is that whether it is reasonable if I trained MLP by early-stopping without any dropout layers.

Best Answer

Yes, this can certainly happen. There is no such thing that dropout is always better and necessary. It's another means of regularization.