Solved – Why is training a deep convolutional neural network taking longer time than anticipated

conv-neural-networkdeep learningdeep-belief-networksmachine learning

It's taking me over 4 days to train a deep learning network with just 10000 images of 224px x 224px x 3 channels size, with batch size 25. The machine has 32GB RAM, a Core i7 CPU, and a GTX 960 GPU. I'm further using matconvnet and cudnn. Instead of training from scratch, I am using the ready made vggface model.

My question is: what could be the reason for the long training time and how could I fix this?

Best Answer

There are a number of reasons training might be taking a long time, but the first reason that comes to mind is that you haven't set an appropriate learning rate. If your learning rate is too high, instead of descending towards a minimum, your gradient path will bounce around uncontrollably/erratically, and this could happen indefinitely. If your learning rate is too low (I've seen this happen less frequently in practice), your model will take a longer time to reach the minimum, but it will eventually arrive. I'm not familiar with the particular libraries you've referenced, so unfortunately I can't offer library-specific details.

There is a pretty good general answer (with references) on Stack Overflow about setting a good learning rate in neural networks. See the link below.

https://stackoverflow.com/questions/11414374/neural-network-learning-rate-and-batch-weight-update