MATLAB: Reproducibility in neural network

MATLABneural networkoverfitting/classificatonrandom

I'm trying to breakdown the MATLAB neural network GUI by working out what each feature does. I'm keeping it simple by using the default training method (scg), and the MATLAB wine dataset for training/testing. For the time being, and for experimentation, I've removed the validation dataset, and I've set the NN up with 50 hidden nodes.
What I can't work out is why the results it produces are exactly the same each time. It takes exactly the same amount of epochs to get to the minimum gradient, performance and gradient values are exactly the same, and the results produced in the confusion matrix are exactly the same. The only thing I can think of is that the data splitting and initialisation of weights are not randomised, but everywhere I look online suggests that (by default) MATLAB does indeed randomise those parameters.
What am I missing? Are the weights and datasets not randomised after all? Code being used is below.
% Load MATLAB default wine dataset.
[x1,t1] = wine_dataset;
% Create net, 50 hidden nodes.
net = patternnet(50);
% Split the data into a 75% training and 25% testing group. Validation
% removed.
net.divideParam.trainRatio = 3/4;
net.divideParam.valRatio = 0;
net.divideParam.testRatio = 1/4;
% Train the data.
train(net,x1,t1);

Best Answer

YIKES!!! You have entered the creepy world of
(TRUMPETS PLEASE!)
OVERTRAINING AN OVERFIT NET!!!
You can prevent the overtraining by
1. Using a validation set. Look at the performance plot
and see the drastic log-scale difference in performance
between the training and testing subset performances.
2. Using regularization. With regression this means
replacing the performance function MSE with MSEREG
which is something like
MSEREG = MSE + lambda * norm(weights)
Therefore, if you use large weights or , more likely, too many weights due to too many hidden nodes, training will be terminated earlier.
However with classification, using patternnet, the default performance measure is CROSSENTROPY. I am not sure if this is MATLAB compatible with regularization.
3. Use the Bayesian Regularization training function TRAINBR which by default, uses Nval = 0 and a form of MSEREG. HOWEVER, I'm not sure if this is MATLAB compatible with CROSSENTROPY.
4. Instead of preventing overtraining, you can prevent overfitting by just using fewer hidden nodes:
[x t] = wine_dataset;
[ I N ] = size(x) % [13 178 ]
[O N ] = size(t) % [ 3 178 ]
vart = mean(var(t',1))% 0.21944
Ntst = round(0.25*N) % 45
Ntrn = N-Ntst % 133
Ntrneq = Ntrn*O % 399 training equations
5. When the net is configured with H = 50 hidden nodes, the number of unknown weights will be
Nw = (I+1)*H+(H+1)*O % 853 unknown weights
which is more than twice the number of training equations !!!
==> OVERFITTING!
H = 50
net = patternnet(H);
Nw = net.numWeightElements % 50 when unconfigured
net = configure(net,x,t);
Nw = net.numWeightElements % 853 when configured
Note: Training will automatically configure an unconfigured net
To avoid overfitting
Nw <= Ntrneq <==> H <= Hub
Hub = (Ntrneq-O)/(I+O+1) % 23.294
Therefore H <= 23 avoids overfitting
net.divideParam.testratio = 3/4;
net.divideParam.valratio = 0;
net.divideParam.testratio = 1/4;
[ net tr y e ] = train(net,x,t);
% y = net(x); e = t-y % error
NMSE = mse(e)/vart % 0.017875
Rsq = 1- NMSE % 0.98213
Therefore, the net models 98.2% of the average target variance.
However, the net is overfitted. Therefore, the difference between the test and training performances is very important.
Moreover, the net is a classifier. Therefore, the difference between the training and test performances in terms of CROSSENTROPY and CLASSIFICATION RATE is more important!
indtrn = tr.trainInd;
indval = tr.valInd % Empty matrix: 1-by-0
indtst = tr.testInd;
TO BE CONTINUED
Hope this helps.
Thank you for formally accepting my answer
Greg