MATLAB: Normalization inputs data & dividing data for training – validation- test

lmertutorial

Could you help me please I have two questions about neural networks for solar irradiance forecasting. I used MLP model (Fitting) with one hidden layer, 7 inputs and 1 output (solar irradiation).My questions are the following : – It's necessary to use these following commands to normalize my inputs data ?? (I use a sigmoid function as activation function in hidden layer, and linear function in the ouput layer)

net.inputs{1}.processFcns = {'removeconstantrows','mapminmax'};
net.outputs{2}.processFcns = {'removeconstantrows','mapminmax'};

Or I can just use the simple mathematical formula : In=(Inn-Imin)/(Imax-Imin)

while In: normalized input ; Inn: No normalized input ???

– Second question is about dividing data for training, this is my code about dividing :

inputs = A';   % used for training 

targets = B';  % used for training 
inputsTesting=C';  % used for test unseen by neural network 
targetsTesting=D';  %used for test unseen by neural network
% Setup Division of Data for Training, Validation, Testing
net.divideFcn = 'dividerand';  % Divide data randomly
net.divideMode = 'sample';  % Divide up every sample
net.divideParam.trainRatio = 75/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 10/100;% this is my problem !!!!!
|*% Create a Fitting Network*|
   net=fitnet(Nubmer of nodes in haidden layer);

% tarining

net.trainFcn = 'trainlm';  % Levenberg-Marquardt
[net,tr] = train(net,inputs,targets);
 outputs = net(inputsTesting); % inputs Testing :unseen by neural network
 perf = mse(net,targetsTesting,outputs); % targets Testing: unseen by network

My question is what does mean this command below ???I think this command is unnecessary because i used data testing unseen by network?? !!! So what i can do about this mistak ?? !!!!

net.divideParam.testRatio = 10/100;

Neural network use 10% of data alerady seen for testing ??

please Help

best regards

Best Answer

1. See the description and example used in the help fitnet and doc fitnet documentation

 [x,t] = simplefit_dataset;
 net   = fitnet(10);
 net   = train(net,x,t);
 view(net)
 y    = net(x);
 perf = perform(net,y,t)
 perf =
   1.4639e-04

2. For some reason (BUG?) the numerical result is only given in doc fitnet, not in help fitnet.

3. HOWEVER, the result is scale dependent ( Since perf for fitnet is mean-squared-error mse(t-y), multiplying t by positive number a will increase perf by a^2

4. Therefore, normalize perf with the performance of the NAIVE CONSTANT MODEL y = constant whose error is minimized when the constant is the mean of the target

   y00   = repmat( mean(t,2), 1, size(t,2))
   MSE00 = mse( t - y00 )
   MSE00 = mean(var(t',1))  % 8.3378
   nperf = mse(t-y)/MSE00   % 1.7557e-05

5. Note that the only input is the number of hidden nodes H = 10.

6. HOWEVER, reading the corresponding documentation indicates that H = 10 is a default.

7. Therefore, the net creation statement can be replaced by

   net = fitnet;

8. HOWEVER, there will be a different answer each time the code is run. This is because of

    a. Default RANDOM data division 70/15/15
    b. Default RANDOM initial weights

9. In order to duplicate results, initialize the RNG to the same initial state ( your choice ) BEFORE the train statement.

10. That is all you need to get a repeatable result.

11. HOWEVER, since the initial weights are random, there is no guarantee that the automatic choice is successful. In addition, in the general case, there is no guarantee that H = 10 is a good choice.

12. This can be mitigated by designing multiple nets in a double for loop over H = Hmin:dH:Hmax and Nweighttrials = 1:Ntrials. Then choosing the net with best validation set performance result.

13. HOWEVER, the perf value obtained above combines the result for all of the data subsets: trn, val and tst.

14. To obtain separate results for each subset, use the training record tr from

rng('default')    % Or your favorite RNG state
[ net tr y e ] = train( net, x, t);
 % y = net(x);    % output
 % e = t-y;       % error
 NMSE    = mse(e)/MSE00
 NMSEtrn = tr.best_perf/MSE00   % BIASED: trn used to obtain weights
 NMSEval = tr.best_vperf/MSE00  % BIASED: val used to stop training AND pick best of multiple designs
 NMSEtst = tr.best_tperf/MSE00  % Use to obtain UNBIASED performance estimate

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

 [ x ,t ] = simplefit_dataset;
 MSE00    = mean( var( t',1) ) %  8.3378
 net      = fitnet;
 rng( 'default' )
 [net tr y e] = train(net, x, t);
 view(net)
 NMSE    = mse(e)/MSE00        % 1.7558e-05
 NMSEtrn = tr.best_perf/MSE00  % 1.4665e-05
 NMSEval = tr.best_vperf/MSE00 % 1.06e-05
 NMSEtst = tr.best_tperf/MSE00 % 3.8155e-05

Hope this helps,

Thank you for formally accepting my answer

Greg

PS Many real world examples will require searches for H and initial weights. If you reuse the same net, be sure to use the function CONFIGURE for weight initialization.

Best Answer

Related Solutions

MATLAB: Normalize Inputs and Targets of neural network

MATLAB: Neural network (fitnet) and data decomposition

Related Question