# MATLAB: Normalization inputs data & dividing data for training – validation- test

lmertutorial

Could you help me please I have two questions about neural networks for solar irradiance forecasting. I used MLP model (Fitting) with one hidden layer, 7 inputs and 1 output (solar irradiation).My questions are the following : – It's necessary to use these following commands to normalize my inputs data ?? (I use a sigmoid function as activation function in hidden layer, and linear function in the ouput layer)
``net.inputs{1}.processFcns = {'removeconstantrows','mapminmax'};net.outputs{2}.processFcns = {'removeconstantrows','mapminmax'};``
Or I can just use the simple mathematical formula : In=(Inn-Imin)/(Imax-Imin)
while In: normalized input ; Inn: No normalized input ???
– Second question is about dividing data for training, this is my code about dividing :
``inputs = A';   % used for training targets = B';  % used for training inputsTesting=C';  % used for test unseen by neural network targetsTesting=D';  %used for test unseen by neural network% Setup Division of Data for Training, Validation, Testingnet.divideFcn = 'dividerand';  % Divide data randomlynet.divideMode = 'sample';  % Divide up every samplenet.divideParam.trainRatio = 75/100;net.divideParam.valRatio = 15/100;net.divideParam.testRatio = 10/100;% this is my problem !!!!!|*% Create a Fitting Network*|   net=fitnet(Nubmer of nodes in haidden layer);``
% tarining
``net.trainFcn = 'trainlm';  % Levenberg-Marquardt[net,tr] = train(net,inputs,targets); outputs = net(inputsTesting); % inputs Testing :unseen by neural network perf = mse(net,targetsTesting,outputs); % targets Testing: unseen by network``
My question is what does mean this command below ???I think this command is unnecessary because i used data testing unseen by network?? !!! So what i can do about this mistak ?? !!!!
``net.divideParam.testRatio = 10/100;``
Neural network use 10% of data alerady seen for testing ??
best regards

1. See the description and example used in the help fitnet and doc fitnet documentation
`` [x,t] = simplefit_dataset; net   = fitnet(10); net   = train(net,x,t); view(net) y    = net(x); perf = perform(net,y,t) perf =   1.4639e-04``
2. For some reason (BUG?) the numerical result is only given in doc fitnet, not in help fitnet.
3. HOWEVER, the result is scale dependent ( Since perf for fitnet is mean-squared-error mse(t-y), multiplying t by positive number a will increase perf by a^2
4. Therefore, normalize perf with the performance of the NAIVE CONSTANT MODEL y = constant whose error is minimized when the constant is the mean of the target
``   y00   = repmat( mean(t,2), 1, size(t,2))   MSE00 = mse( t - y00 )   MSE00 = mean(var(t',1))  % 8.3378   nperf = mse(t-y)/MSE00   % 1.7557e-05``
5. Note that the only input is the number of hidden nodes H = 10.
6. HOWEVER, reading the corresponding documentation indicates that H = 10 is a default.
7. Therefore, the net creation statement can be replaced by
``   net = fitnet;``
8. HOWEVER, there will be a different answer each time the code is run. This is because of
``    a. Default RANDOM data division 70/15/15    b. Default RANDOM initial weights``
9. In order to duplicate results, initialize the RNG to the same initial state ( your choice ) BEFORE the train statement.
10. That is all you need to get a repeatable result.
11. HOWEVER, since the initial weights are random, there is no guarantee that the automatic choice is successful. In addition, in the general case, there is no guarantee that H = 10 is a good choice.
12. This can be mitigated by designing multiple nets in a double for loop over H = Hmin:dH:Hmax and Nweighttrials = 1:Ntrials. Then choosing the net with best validation set performance result.
13. HOWEVER, the perf value obtained above combines the result for all of the data subsets: trn, val and tst.
14. To obtain separate results for each subset, use the training record tr from
``rng('default')    % Or your favorite RNG state[ net tr y e ] = train( net, x, t); % y = net(x);    % output % e = t-y;       % error NMSE    = mse(e)/MSE00 NMSEtrn = tr.best_perf/MSE00   % BIASED: trn used to obtain weights NMSEval = tr.best_vperf/MSE00  % BIASED: val used to stop training AND pick best of multiple designs NMSEtst = tr.best_tperf/MSE00  % Use to obtain UNBIASED performance estimate``
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
`` [ x ,t ] = simplefit_dataset; MSE00    = mean( var( t',1) ) %  8.3378 net      = fitnet; rng( 'default' ) [net tr y e] = train(net, x, t); view(net) NMSE    = mse(e)/MSE00        % 1.7558e-05 NMSEtrn = tr.best_perf/MSE00  % 1.4665e-05 NMSEval = tr.best_vperf/MSE00 % 1.06e-05 NMSEtst = tr.best_tperf/MSE00 % 3.8155e-05``
Hope this helps,
Thank you for formally accepting my answer
Greg
PS Many real world examples will require searches for H and initial weights. If you reuse the same net, be sure to use the function CONFIGURE for weight initialization.