MATLAB: Normalization inputs data & dividing data for training – validation- test


Could you help me please I have two questions about neural networks for solar irradiance forecasting. I used MLP model (Fitting) with one hidden layer, 7 inputs and 1 output (solar irradiation).My questions are the following : – It's necessary to use these following commands to normalize my inputs data ?? (I use a sigmoid function as activation function in hidden layer, and linear function in the ouput layer)
net.inputs{1}.processFcns = {'removeconstantrows','mapminmax'};
net.outputs{2}.processFcns = {'removeconstantrows','mapminmax'};
Or I can just use the simple mathematical formula : In=(Inn-Imin)/(Imax-Imin)
while In: normalized input ; Inn: No normalized input ???
– Second question is about dividing data for training, this is my code about dividing :
inputs = A'; % used for training

targets = B'; % used for training
inputsTesting=C'; % used for test unseen by neural network
targetsTesting=D'; %used for test unseen by neural network
% Setup Division of Data for Training, Validation, Testing
net.divideFcn = 'dividerand'; % Divide data randomly
net.divideMode = 'sample'; % Divide up every sample
net.divideParam.trainRatio = 75/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 10/100;% this is my problem !!!!!
|*% Create a Fitting Network*|
net=fitnet(Nubmer of nodes in haidden layer);
% tarining
net.trainFcn = 'trainlm'; % Levenberg-Marquardt
[net,tr] = train(net,inputs,targets);
outputs = net(inputsTesting); % inputs Testing :unseen by neural network
perf = mse(net,targetsTesting,outputs); % targets Testing: unseen by network
My question is what does mean this command below ???I think this command is unnecessary because i used data testing unseen by network?? !!! So what i can do about this mistak ?? !!!!
net.divideParam.testRatio = 10/100;
Neural network use 10% of data alerady seen for testing ??
please Help
best regards

Best Answer

1. See the description and example used in the help fitnet and doc fitnet documentation
[x,t] = simplefit_dataset;
net = fitnet(10);
net = train(net,x,t);
y = net(x);
perf = perform(net,y,t)
perf =
2. For some reason (BUG?) the numerical result is only given in doc fitnet, not in help fitnet.
3. HOWEVER, the result is scale dependent ( Since perf for fitnet is mean-squared-error mse(t-y), multiplying t by positive number a will increase perf by a^2
4. Therefore, normalize perf with the performance of the NAIVE CONSTANT MODEL y = constant whose error is minimized when the constant is the mean of the target
y00 = repmat( mean(t,2), 1, size(t,2))
MSE00 = mse( t - y00 )
MSE00 = mean(var(t',1)) % 8.3378
nperf = mse(t-y)/MSE00 % 1.7557e-05
5. Note that the only input is the number of hidden nodes H = 10.
6. HOWEVER, reading the corresponding documentation indicates that H = 10 is a default.
7. Therefore, the net creation statement can be replaced by
net = fitnet;
8. HOWEVER, there will be a different answer each time the code is run. This is because of
a. Default RANDOM data division 70/15/15
b. Default RANDOM initial weights
9. In order to duplicate results, initialize the RNG to the same initial state ( your choice ) BEFORE the train statement.
10. That is all you need to get a repeatable result.
11. HOWEVER, since the initial weights are random, there is no guarantee that the automatic choice is successful. In addition, in the general case, there is no guarantee that H = 10 is a good choice.
12. This can be mitigated by designing multiple nets in a double for loop over H = Hmin:dH:Hmax and Nweighttrials = 1:Ntrials. Then choosing the net with best validation set performance result.
13. HOWEVER, the perf value obtained above combines the result for all of the data subsets: trn, val and tst.
14. To obtain separate results for each subset, use the training record tr from
rng('default') % Or your favorite RNG state
[ net tr y e ] = train( net, x, t);
% y = net(x); % output
% e = t-y; % error
NMSE = mse(e)/MSE00
NMSEtrn = tr.best_perf/MSE00 % BIASED: trn used to obtain weights
NMSEval = tr.best_vperf/MSE00 % BIASED: val used to stop training AND pick best of multiple designs
NMSEtst = tr.best_tperf/MSE00 % Use to obtain UNBIASED performance estimate
[ x ,t ] = simplefit_dataset;
MSE00 = mean( var( t',1) ) % 8.3378
net = fitnet;
rng( 'default' )
[net tr y e] = train(net, x, t);
NMSE = mse(e)/MSE00 % 1.7558e-05
NMSEtrn = tr.best_perf/MSE00 % 1.4665e-05
NMSEval = tr.best_vperf/MSE00 % 1.06e-05
NMSEtst = tr.best_tperf/MSE00 % 3.8155e-05
Hope this helps,
Thank you for formally accepting my answer
PS Many real world examples will require searches for H and initial weights. If you reuse the same net, be sure to use the function CONFIGURE for weight initialization.