MATLAB: How could I do a Multi-step ahead Prediction without know the input serie validation.HELPPP

Deep Learning Toolboxneural networksvalidation definition

I am learning neural networks, I am doing some small exercises to learn it, but I have a huge question that I cannot figure it out. If I have a time series(input=X, target=T), and I am using input_training=X(1:end-N), target_train=T(1:end-N). My validation data is: input_val=X(end-N+1:end), target_val=T(end-N+1:end). I am testing a NARX, if this happen: – input_val(it is available). – target_val(it is not available). If that conditions happen I get good predictions(error<3%), but I would like to know how could I get good predictions if : – input_val(it is not available). – target_val(it is not available).

Thanks for you help….

Best Answer

What data division function are you using?

Training, Validation and Testing are three separate functions. In order to obtain unbiased estimates of performance on nondesign data:

Total = Design + Test

Design = Training + Validation

Training subset:

 Used to directly estimate unknown weight values ( e.g., via gradient descent)

Validation subset:

 Used REPETETIVELY with Training set to determine the best set of training 
parameters (e.g., No of hidden nodes, stopping epoch, selection of input and feedback delays, etc) and best of multiple random weight initialization designs.

Test subset:

 Used ONCE and ONLY ONCE on the best design w.r.t. validation subset 
 performance to obtain an UNBIASED estimate of performance on nondesign 
 data (AKA generalization).

If the test set estimate is unsatisfactory, the data set should be randomly divided again and the entire procedure duplicated. Reusing the same data division biases the resulting test subset estimate.

Quite often the unbiased constraint of this procedure is violated by including the test subset in the choice of the best design. If this is done, I recommend for the sake of caution, that another round with a new random division still be performed.

The above procedure is difficult to implement with time series because uniform spacing should be maintained to preserve output-feedback autocorrelations and input-output cross-correlations.

Now, I do not understand your problem because I do not understand why you are using a validation subset without a test set to estimate nondesign performance. Posting your code with comments would help immensely.

Hope this helps.

Greg

Related Solutions

MATLAB: Hi, i am using NARX todo multi step prediction of a daily stock market index (Sensex 2003×1 matrix) using another one as input (Nifty 2003×1 matrix). I am having problem with the close loop

%% 1. Importing data % Matrix of 2003x1 each are daily stock market indices data of Nifty & Sensex

 > load Nifty.dat;
 > load Sensex.dat;

% To scale the data it is converted to its log value:

 > lognifty = log(Nifty); 
 > logsensex = log(Sensex);
 > X = tonndata(lognifty,false,false); 
 > T =  tonndata(logsensex,false,false);

%% 2. Data preparation

 > N = 300; % Multi-step ahead prediction

% Input and target series are divided in two groups of data: % 1st group: used to train the network inputSeries = X(1:end-N); % targetSeries = T(1:end-N); % 2nd group: this is the new data used for simulation. inputSeriesVal will % be used for predicting new targets. targetSeriesVal will be used for % network validation after prediction

Notation:

 data    = design + test
 design = training + validation

Val subsets are used repetetively with Trn subsets to DESIGN a net with a good set of training parameters (e.g., input delays, feedback delays, number of hidden nodes, stopping epoch, etc). The best of multiple designs is, typically, based on indirectly minimizing MSEval.

After the best design is chosen, the nondesign Test subset is used to estimate generalization performance on nondesign data.

By DEFAULT, the data will be divided RANDOMLY into THREE trn/val/tst subsets according to

 dividerand( 2003, 0.7, 0.15, 0.15 )

I disagree with the use of dividerand for uniformly spaced time-series. Replace with one of the other divide functions. (When Nval=Ntst =0, I use 'dividetrain'. Otherwise I use , 'divideblock' or 'divideind' to maintain uniform spacing);

 > inputSeriesVal  = X(end-N+1:end);
 > targetSeriesVal = T(end-N+1:end); % This is generally not available

Change "Val" to "Test" since the subsets are only used for performance evaluation (NOT "validation") and not design.

Since a NNTBX BUG will not allow a test subset without a validation subset and visa versa, there are two options

 1. Use trn/val/tst (Nval=Ntst = 300) and 'divideblock' or 'divideind'  
   (recommended) 
 2. a.Remove the tst subset (Ntst = 300) from training, 
    b. Do not use a val set (Nval=0) 
    c. Use 'dividetrain' to only train on training data(Ntrn = 1703). 
    d. Calculate the test subset performance separately

%% 3.

 > Network Architecture delay = 2; 
 > neuronsHiddenLayer = 50;

Use the autocorrelation function to determine the significant feedback delays. Use the crosscorrelation function to determine the significant input delays.

% Network Creation

 > net = narxnet(1:delay,1:delay,neuronsHiddenLayer);

%% 4. Training the network

 > [Xs,Xi,Ai,Ts] = preparets(net,inputSeries,{},targetSeries); 
% > net = train(net,Xs,Ts,Xi,Ai);
 [ net tr Ys Es Xf Af ] = train(net,Xs,Ts,Xi,Ai);
 tr = tr     % To obtain important info
 > view(net) 
 > Y = net(Xs,Xi,Ai); % Performance for the series-parallel implementation, only

% one-step-ahead prediction

 > perf = perform(net,Ts,Y);

%% 5. Multi-step ahead prediction

 >inputSeriesPred  =[inputSeries(end-delay+1:end),inputSeriesVal]; 
 >targetSeriesPred = [targetSeries(end-delay+1:end), con2seq(nan(1,N))]; 
 >netc = closeloop(net); 
 >view(netc)

Check netc on previous data. If performance is bad, improve it by training netc on the previous data.

 >[Xs,Xi,Ai,Ts] = preparets(netc,inputSeriesPred,{},targetSeriesPred); 
 >yPred =netc(Xs,Xi,Ai); 
 >perf = perform(net,yPred,targetSeriesVal); 
 >figure;
 >plot([cell2mat(targetSeries),nan(1,N);
 >    nan(1,length(targetSeries)),cell2mat(yPred); 
 >  nan(1,length(targetSeries)),cell2mat(targetSeriesVal)]')
 >legend('Original Targets','Network Predictions','Expected Outputs')

% Network predictions are coming very bad.. I guess there is some problem % with the close loop's initial input states and initial layer states. % please help.

1. Optimize ID and FD

2. Use trn/val/trn with 'divideblock' or 'divideind'

3. Compare netc and net performance on openloop data

4. If necessary, use train on netc.

5. Then consider nondesign data

Hope this helps

Thank you for formally accepting my answer

Greg

MATLAB: How to improve results for river_dataset predicting ahead

You are misusing the term validation.

 total = design + test
 design = training + validation

The validation set is part of the design set and helps determine when to stop training. Estimates of performance on nondesign data (i.e., generalization) are obtained using a "holdout" test set.

Apply your data to the help example

 help narnet

Use divideblock to create the training, validation and test sets with uniform spacing.

Later you can see what happens if you try to exclude the validation set.

Initialize the RNG and obtain the training record tr:

 rng(0)
 [ net tr Ys Es Xf Af ] = train(net,Xs,Ts,Xi,Ai);

Search for some of my posts

 greg narnet

Thank you for formally accepting my answer

Greg

Best Answer

Related Solutions

MATLAB: Hi, i am using NARX todo multi step prediction of a daily stock market index (Sensex 2003×1 matrix) using another one as input (Nifty 2003×1 matrix). I am having problem with the close loop

MATLAB: How to improve results for river_dataset predicting ahead

Related Question