Solved – Is finding a good multilayer perceptron (MLP) model just luck

deep learningmachine learningneural networks

I am trying to build a MLP model for a classification task. I am using Tensorflow in Python for building a DNN model, there are a lot of parameters. At least, how many hidden layers and how many neurons per each layer is a tough question.

I tried with different combination of hidden layers, and each of them give a different result. I just try and try and try again …

It seems to me that, if the process stops and I can find a good model, it is just luck. For instance, if I stop with the hidden layers as [100,1000,2000,5], I mean, I just find that after many times of trial.

Is it true? Is there a way to build a MLP model for a particular dataset?

P/S: Now, I understand that MLP with multi hidden layers is deep neural network and belongs to deep learning. Is it true?

Best Answer

Yes it has been observed that deep MLPs can be tricky to optimize. In fact they suffer from a diminishing returns with added complexity, see Big Neural Networks Waste Capacity by Yoshua Bengio and Yann Dauphin.

MLPs do belong to deep learning models as they stack layers of neurons together with nonlinear activation functions.

MLPs are universal function approximators, but as you mentioned that doesn't mean it's easy to find a good parameterization of an MLP via optimization. Perhaps we haven't found the appropriate optimization method as it seems that most people predominantly use variants of minibatch stochastic gradient descent due to their nice scaling properties as well as their stochastic-ness which might help us "jitter" our way out of some local minimas during training. However for larger capacity MLPs, these are dense, fully connected models and perhaps there's lots of 2nd order interactions between the units which leads to poor conditioning of our optimization problem and can lead to some finicky results, precisely the kind you are observing. Bengio and Dauphin discuss this in the above paper that I linked.

What type of data are you training on? If it's image data it's highly recommended you use a convolutional neural network. The convolutional layers offers all sorts of benefits over vanilla MLPs. It is hypothesized that the sparsity helps the network learn stronger, discriminative features and in general that the use of convolutional layers itself is a strong prior for image data.

For language modeling people seem to have some success using the Word2Vec embeddings. See one of the original innovative articles back from 2003 A Neural Probabilistic Language Model and some of the more recent work by Mikolov, Sutskever and others on word embedding and recurrent neural networks, see Distributed Representations of Words and Phrases and their Compositionality.