Solved – How to fight underfitting in deep neural net

autoencodersclassificationdeep learningoptimization

When I started with ANN I thought I'd have to fight overfitting as the main problem. But in practice I can't even get my NN to pass the 20% error rate barrier. I can't even nearly beat my score on random forest!

I'm seeking some very general or not so general advice on what should one do to make his NN start capturing trends in data.

For implementing NN I use Theano Stacked Auto Encoder, the code from tutorial that works great (less than 5% error rate) on classifying MNIST dataset.
It is a multilayer perceptron, with softmax layer on top with each hidden later being pre-trained as autoencoder (fully described at tutorial, chapter 8).
The number of input features are ~50 and output classes ~10. NN has sigmoid neurons and all data normalized to [0,1]. I tried lots of different configurations: number of hidden layers and neurons in them (100->100->100, 60->60->60, 60->30->15, etc,etc), different learning and pre-train rates, etc.

And the best thing I can get is %20 error rate on validation set and %40 error rate on test set.

On the contrary, when I try to use Random Forest (from scikit-learn) I easily get %12 error rate on validation set and %25(!) on test set.

How can that be that my deep NN with pre-training behaves so badly? What should I try?

Best Answer

The most obvious issue that I see is the size of your hidden layers. They're generally trained on a GPU and the training can take hours, days, and sometimes weeks.

The "capacity"of your network is not large enough to fully represent the underlying variability in the data (MNIST). Try increasing the size of your hidden layers.