MATLAB: NARX prediction and generalization

narxnarxnetnonlinear regressionprediction

Dear all,
I want to address a prediction problem with a NARX. My goal is to predict 6 samples ahead, so I built the dataset in this manner (FILE ATTACHED):
  • x is the time series, dropping the 6-1=5 last samples
  • y is basically a filtered version of x (smoothed with splines->csaps), dropping the 5 first samples
So I first train the narxnet in open-loop, then in closed-loop. The prediction capabilities on the same time series are not too bad. Besides, I want to estimate how well the NARX predictor performs on a new unseen time series. Things go much worse, but not horribly. The point is that in fact I have 30 time series from 30 subjects, so my idea would be to use a leave-one-out procedure: train successively the NARX on a collection of 29 times series, test on the remaining one. In this way, with much more data, prediction and generalization capabilities should increase… but still I cannot find how to train the net on several series. I saw an old question in this regard, unsolved… hopefully you found out.
Ideas? Thank you so much!
close all;
clear all;
load('NARX_Data.mat');
% hidden neurons
NEUR=4;
% buffer sizes
NX=6;
NY=2;
% delays
DX=(1:NX);
DY=(1:NY);
% data preparation
xTrain=num2cell(xTrain);
yTrain=num2cell(yTrain);
xTest=num2cell(xTest);
yTest=num2cell(yTest);
% net creation
narx=narxnet(DX,DY,NEUR);
% narx.trainFcn='trainbr';
narx.divideFcn='';
% narx.divideFcn='divideind';
% narx.divideFcn='divideblock';
% training in open loop
[XsTrain,XiTrain,AiTrain,YsTrain]=preparets(narx,xTrain,{},yTrain);
narx=train(narx,XsTrain,YsTrain,XiTrain);
% use in open loop
% yOLTrainIdeal=cell2mat(YsTrain);
% yOLTrainPred=cell2mat(sim(narx,XsTrain,XiTrain));
% errOLTrain=yOLTrainPred-yOLTrainIdeal;
% figure;
% plot(errOLTrain)
% use in closed loop
yCLTrain=yTrain;
xCLTrain=xTrain;
% training in closed loop
narxCL=closeloop(narx);
[XsCLTrain,XiCLTrain,AiCLTrain,YsCLTrain]=preparets(narxCL,xCLTrain,{},yCLTrain);
narxCL=train(narxCL,XsCLTrain,YsCLTrain,XiCLTrain);
yCLTrainIdeal=cell2mat(YsCLTrain);
yCLTrainPred=cell2mat(narxCL(XsCLTrain,XiCLTrain,AiCLTrain));
% unseen test data
[XsCLTest,XiCLTest,AiCLTest,YsCLTest]=preparets(narxCL,xTest,{},yTest);
yCLTestIdeal=cell2mat(YsCLTest);
yCLTestPred=cell2mat(narxCL(XsCLTest,XiCLTest,AiCLTest));
% comparative
figure; hold on
plot(yCLTrainIdeal,'b');
plot(yCLTrainPred,'r');
plot(yCLTestIdeal,'k');
plot(yCLTestPred,'g');

Best Answer

I do not see the point:
1. Plots of x and y almost overlap
Rsqtrn = 1 - mse(yTrain-xTrain)/var(yTrain,1) = 0.87053
Rsqtst = 1 - mse(yTest-xTest)/var(yTest,1) = 0.92444
2. The Test data doesn't look anything like the Train data
Rsqx = 1 - mse(xTrain-xTest)/var(xTrain,1) % -1.9215
Rsqy = 1 - mse(yTrain- yTest)/var(yTrain,1) % -2.0243
3. So why in the world should the net work on the Test data ???
Hope this helps.
Greg