Solved – Deep belief network performs worse than a simple MLP

deep-belief-networksneural networks

I tried to train a deep belief network to recognize digits from the MNIST dataset. Everything works OK, I can train even quite a large network. The problem is that the best DBN is worse than a simple multilayer perceptron with less neurons (trained to the moment of stabilization). Is this normal behaviour or did I miss something?

Here is an example: DBN with layers 784-512-512-64-10 (red/green/black – 100/200/400 iterations of RBM) vs MLP 784-512-256-10 (blue line).

Plot of the error(iteration):

enter image description here

I tested many more configurations and the MLP seems to be better almost always (if the size of the MLP is not too large so it can be trained).

Best Answer

My guess is that your training procedure is simply unable to find good parameters for such a big model. It is quite hard to get an MLP with more than three layers to work well on image classification problems, even when pretraining with a DBN. Most papers on image classification I have seen use three layers or even show that the performance decreases when you use more layers (this paper for example, table 6). So yes, this behavior is kind of normal.

It also fits with my experience with DBNs. After two or three layers, the generative performance of the DBN saturates or even decreases.

Another hint that you are likely facing an underfitting problem is your performance. Yann LeCun's website shows that you can get below 3% error even with fairly small 3-layer MLPs. Your error on the other hand seems to stay well above 5% for all of your models.

My suggestion would therefore be to stick with a smaller model or switch to more powerful optimization techniques.

Related Question