MATLAB: Is the neural network performing worse as the number of hidden layers increases

Deep Learning Toolboxhidden layersneural networknumber oftutorial

Hello, I am currently using the Matlab Neural Network toolbox to experiment with the Iris dataset. I am training with "trainlm" algorithm, and I decided to see what would happen if I trained with 1:20 hidden layers. I was not expecting any change in the classification error, but when I do this, I get the following output:

I have been looking for a solution, but I cannot explain why the classification error begins to jump, or even increases at all as the number of hidden layers increases.

Thank You

Best Answer

The ultimate goal is too obtain a net that performs well on non-training data that comes from the same or similar source as the training data. This is called GENERALIZATION.

Frequent causes of failure are

 1. Not enough weights to adequately characterize the training data
 2. Training data does not adequately characterize the salient features of non-training data because of measurement error, transcription error, noise, interference, insufficient sampling size and variability
 3. Fewer training equations than unknown weights.
 4. Random weight initialization

Various techniques use to mitigate these causes are

 1. Remove bad data and outliers (plots help)
 2. Use enough training data to sufficiently characterize non-training data.
 3. Use enough weights to adequately characterize the training data
 4. Use more training equations than unknown weights. The stability of 
solutions w.r.t. noise and errors increases as the ratio increases.
 5. Use the best of multiple random initialization & data-division designs 
 6. K-fold Cross-validation
 7. Validation Stopping
 8. Regularization

For the iris_dataset

 [ I N ] = size(input)  % [ 4 150 ]
 [ O N ] = size(target) % [ 3 150 ]
 Assuming the default 0.7/0.15/0.15 trn/val/tst  data division, the number of training equations is approximately
 Ntrneq = 0.7*N*O   % 315
 Assuming the default I-H-O node topology, the number of unknown equations is
 Nw = (I+1)*H+(H+1)*O = (I+O+1)*H + O
 Obviously, Nw < Ntrneq when H <= Hub  (upper bound) where
 Hub = floor( (Ntrneq-O)/(I+O+1))  % 39
 Expecting decent solutions for H <= 20 seems reasonable. However, to 
mitigate the random initial weights and data division, design 10 nets for each value
 I have posted zillions of examples in both the NEWSGROUP and ANSWERS. I use patternnet for classification.

Hope this helps.

Thank you for formally accepting my answer

Greg

Related Solutions

MATLAB: Weights don’t initialize.

No.

Multilayer percetron doesn't contain 'hardlim'(hardlim -is capable to classify only linearly separable set. Two or more layers in network - aren't separable linearly. ). Using 'logsig'.

The equivalent network - multilayer percetron.

P =[0 1 0 1; 0 0 1 1];
T = [0 0 0 1];
net=newff(minmax(P),[2,10,1],{'logsig','logsig','logsig'},'trainbfg');
net.trainParam.epochs = 100;
net = init (net);
net.IW{1,1}, net.IW{2,1}, 
  net.LW{3,2}
  net.b{1}, net.b{3}
net=train(net,P,T);
a = sim(net,P)

'trainbfg' – back propagation learning.

Error in network design.

MATLAB: Layer in neural network

Two of the most important uses for Neural Networks are regression(function approximation, curvefitting) and classification/pattern-recognition.

The best way to design NNs is via the Neural Network Toolbox

In general, the best function to use for regression is FITNET. Start with the help and doc documentation using the command line commands

 help fitnet
 doc fitnet

Next check out some of my posts on the NEWSGROUP and ANSWERS. Search using

 greg fitnet

In general, the best function to use for classification/pattern-recognition is PATTERNNET. Start with the help and doc documentation using the command line commands

 help patternnet
 doc patternnet

Next check out some of my posts on the NEWSGROUP and ANSWERS. Search using

 greg patternnet

Finally, use MATLAB data to practice with

 help nndatasets
 doc nndatasets

If you have problems, it is easier for us to help if you use those datasets.

The help and doc examples use as many default settings as possible. Therefore my best advice for beginners is to begin your designs similarly with one exception. Since by default, the trn/val/tst data division and initial weights are chosen randomly, it is important to know the initial state of the random number generator. The best way to do that is for you to set it to your favorite state before using TRAIN. If designing multiple designs in a loop, initialize RNG before the loop.

 help rng
 doc rng

The default topology is I-H-O ( H=10 default) for I dimensional input vectors and corresponding O-dimensional Output target vectors. The one hidden layer contains H=10 hidden nodes containing tansig (tanh) transfer functions.

The one hidden layer net is a universal approximator. Increasing the number of hidden nodes increases the ability to model more complex functions. However using more than needed decreases the ability to operate well on unseen data. The only reason to use more layers is if you want to decrease the total number of hidden nodes or, to try to model subclasses for classification.

 hiddenLayerSize=200

Replaces the 10 tansig hidden units with 200. Typically, this is a ridiculous number.

 net.divideParam.trainRatio=70/100 
 net.divideParam.valRatio=15/100 
 net.divideParam.testRatio=15/100

Divides the data into training, validation and test subsets with the ratio 70/15/15. However, this is the default ratio. Therefore those statements can be deleted.