Solved – Increase training performance of a neural network with low learning rate

efficiencymachine learningmodel-evaluationneural networkssums-of-squares

I am trying to train an Artificial Neural Network for classification. In the input layers, I have 402 neurons; the first 400 are binary, and the last two are floating points in the range -1 to 1. In the hidden layer I have 400 neurons, and in the output layer I have a single node which I want to represent values between -1 and 1.

I have tried to train this network using a vectorized implementation of back-propagation which I have found online (I have tried different implementations, and also implemented one myself). My problem is, that my network does not seem to learn a lot. If my learning rate is higher than around 0.0001, I get into trouble, and quickly goes into a local minimum, and with a lower learning rate the learning is (obviously) very very slow.

I can train as much data as possible, so this is not a problem, but of course time is limiting, so I would like to be able to train this network in a decent time.

Do you have any intuition about what might be wrong, or how much data is needed to train this network of around 160.000 weights?

If it is relevant, I can upload some of the data.

Due to the comment by Martin, here is some learning statistics for different number of hidden neurons: Google Docs Spreadsheet

Another thing which I have observed, is that for my dataset, a constant output of 0.3 will result in a SSE of around 160, so I definitely want to get below this SSE.

Best Answer

If time is a limiting factor, you could try reducing the number of hidden units substantially. In my experience, it's very rare to need this many hidden units. I would start with a small number (less than ten) of hidden units, and see if this gives adequate performance.

If you're more worried about local minima, perhaps it's worth trying adding a stochastic component to your algorithm, based on simulated annealing or stochastic gradient descent. This may slow things down a bit, but will prevent local minima from being such a problem.