Solved – Unexpectedly slow convergence when classifying MNIST digits with a Neural Network

julianeural networks

I recently finished my implementation of a multilayer artificial neural network in Julia; I train it with SGD (no momentum, no decay, no regularization, just basic SGD), computing the gradient by means of backpropagation. The weights and biases are initialized by sampling a standard normal distribution.

I tested the implementation against the MNIST dataset, and it takes 5000 iterations for a network with one hidden layer to get an accuracy over 0.90 on the test set. I've seen Python+Numpy examples online that achieve the same accuracy in just a few iterations, using what seems to be the same algorithm with the same hyperparameters. In addition, the accuracy reported in these sites is over 0.95, while I can only achieve an accuracy of 0.93 tops; sometimes I may even get an accuracy around 0.50 or 0.60.

What puzzles me is that the network seems to work, even if convergence is slow. An accuracy between 0.90 and 0.93 seems like an acceptable result for a network of these characteristics; however, the Python implementation seems to converge in much less iterations, and the accuracy seems to be a bit better on average.

Questions: For a good implementation of a multilayer neural network, using a mini batch size of 20, and a learning rate of 3.0, how many iterations should be required to classify the MNIST test set with an accuracy over 0.90 for most initializations? What kind of bug could cause slow convergence but an acceptable accuracy? Could the delta between the maximum accuracy I get with my implementation and the accuracy reported with the Python implementation be due to the stochastic nature of the initialization procedure?

Best Answer

It was my mistake, I had the concepts mixed up: in the examples online they were doing 30 full passes through the training set, not 30 iterations. At 20 samples per iteration, an epoch is 3000 iterations. Then 30 epochs would be 90000 iterations, which is way more than the amount I was using. I just tried with 50000 iterations and I got an accuracy of 0.9467 in the first try.

For reference, one of the Python implementations I mentioned is part of this online book on neural networks: http://neuralnetworksanddeeplearning.com/chap1.html

Related Question