Solved – Deep Learning: Why does increase batch_size cause overfitting and how does one reduce it

computer visiondeep learningmachine learning

I used to train my model on my local machine, where the memory is only sufficient for 10 examples per batch. However, when I migrated my model to AWS and used a bigger GPU (Tesla K80), I could accomodate a batch size of 32. However, the AWS models all performed very, very poorly with a large indication of overfitting. Why does this happen?

The model I am currently using is the inception-resnet-v2 model, and the problem I'm targeting is a computer vision one. One explanation I can think of is that it is probably the batch-norm process that makes it more used to the batch images. As a mitigation, I reduced the batch_norm decay moving average.

Also, should I use dropout together with batch_norm? Is this practice common?

My training images are around 5000, but I trained for around 60 epochs. Is this considered a lot or should I stop the training earlier?

Best Answer

Chapter 6 of Goodfellow's book:

Small batches can oﬀer a regularizing eﬀect (Wilson and Martinez, 2003), perhaps due to the noise they add to the learning process. Generalization error is often best for a batch size of 1. Training with such a small batch size might require a small learning rate to maintain stability because of the high variance in the estimate of the gradient. The total runtime can be very high as a result of the need to make more steps, both because of the reduced learning rate and because it takes more steps to observe the entire training set.

Best Answer

Related Solutions

Solved – Tensorflow Inception: Accuracy of transfer learning vs training from scratch

Solved – More shallow network outperformed a deeper one in accuracy

Related Question