MATLAB: Test data Neural Network

data divisiondatasetDeep Learning ToolboxMATLABneural networktesttimedelaynettraining

Hi everyone,
I write down a code for a timedelay neural network which predicts 2 steps ahead an output y having a matrix on inputs X. I have 630 timesteps. Using the "Sequential order incremental training with learning functions" algorythm (the best one for results achieved), I always get good results" low mse (about 1% of the mean of my targets) and R=0.9 for all the 3 data sets. I divide the data by blocks 3:1:1, in this way I can imagine the test data as an independent data set added later (by definition, the test set is independent from the training one), and I got still good performances.
Then I try in this way: I took the 4/5 of my data, and give them as my new inputs, leaving the last 1/5 as independent data set for testing. I train those data without testing (training 80% validation 20%). In this way, that last 1/5 corresponds to the 20% of testing data of my first simulation. Hence, when I test the network with my independent dataset (outputtest=net(Inputtest) after been trained, I was expecting a similar performance to the test data of 1st simulation (mse low, R=0.9), but it did not happen (never, I have tried many times) and I always get bad performance.
Therefore my question is: what are actually the test data used when training? If really independent, why didn't I get similar results? both test data, or independent new data, are supposed to use, for their inputs, the weights previously updated with the training, and calculate straight away the outputs.
Is what I have written wrong? How can i get the same prediction performance of the test data with new data?
Thank you!

Best Answer

I assume target T is 1-dimensional.
What is the size of the X matrix?
Are X(i,:) and T stationary (e.g., are the ten 63 point means and variances of each variable time-invariant?)
I recommend standardizing X and T with zscore or mapstd.
What are the significant crosscorrelation function lags between X(i,:) and T?
Are you including zero input lag? Unfortunately, it is not a default.
MSE results are more accurately assessed by % of target variance.
MSE00 = mean(var(T',1)) %MSE of NAIVE constant model y = mean(T,2)
NMSE = MSE/MSE00 % Normalized MSE
R^2 = 1-NMSE % Fraction of target variance modeled (Search coefficient of determination in Wikipedia)
Have you replaced the 'dividerand' (correlation destroying) default division option with 'divideblock' or another divide option?
Hope this helps.
Thank you for formally accepting my answer.
Greg