Solved – Cross-validation with neural networks yielding worse results than a standard neural network

cross-validationforecastingMATLABneural networkstime series

Summary: when using a 10-fold cross-validation procedure where each training set is used to generate N bootstrap samples for processing with NNs. How do I provide my NN with correct sequence and elements of my response variables series to match the length of my input time series so that I follow proper cross validation procedure using bootstrap and generate performance, similar (hopefully better), than using a non CV ANN. The purpose of using a cross-validation bootstrap method is to provide an estimate of bias and variance so I can create confidence intervals for my prediction.I am quite confident that my mistake is theory based. I am not sure if I am handling the response variable correctly when a) partitioning my response variable according to the partitioning of my input series using CV, and b) when generating my input and target series to be input into the NN. The problem continued….

I'm working on a project with time series data to fit a hydrological time series using Neural Networks. I am trying to fit my one explanatory variable (X) to a response variable (T), both series are of size 429×1. I have ran a simple NN using a script generated by the nftool and achieve quite nice results, with low MSE and R values above 0.9 for training, validation, and testing sets. I'm new to using ANNs so the only variation in model architecture I am performing is altering the lags in the input time series as well as the amount of hidden layers, for both I range between 1-9, therefore 81 different model architectures.

I chose the best ANN architecture based off a combination of coefficient of determination, mse, mae, and pers index. I now wish to run the best ANN architecture using a 10-fold cross-validation (CV) procedure ensembled with a bootstrap approach.

Now I am running into poor performance issues when running my bootstrapped cross-validation sets through the neural network, I am receiving extremely poor results in all training, validation, and testing sets. The worst performing sets are validation and testing, providing R values around -0.1, and the training sets providing R values around 0.2.

It is obvious to me I have a problem as a cross-validation bootstrap ensemble NN should almost always perform as good as, or at least, the same as the NN with the same architecture.

One common occurrence I have noticed amongst my output responses from the ensemble NN is that there is very small variance in the outputs compared to the inputs and the approximated output bias at each point in the time series hovers around the median value of my input time series. I have also confirmed this by taking a histogram of my CV training sets versus the outputs provided by the NN.

I have ran through my code quite a bit to try and figure out what could be responsible for such poor performance and I am unsure if I am feeding my ensemble ANN the correct response variable for each training/test set case. My basic approach is to generate my 10 different input training/test sets using the cvpartition fcn. When using this function I create a simple for loop to place each of the 10 different train/test sets into a cell array. For every unique train/test set of my input series I use their indexes with respect to the original time series and create unique train/test sets of my response time series to mirror the input time series. Is this an incorrect action? Should I always use the same response series when testing each individual train/test set of the input series provided by cvpartition?

I ask this because in my 10-fold CV bootstrap procedure I resample each of my unique training sets N times to create my bootstrap samples. I then find the set difference of elements between each unique test set and each of the bootstrap samples created from it, creating N sets of elements to be used for validation in the Neural Networks. So as you can imagine my validation sets with always vary in size, while my training sets will always be of consistent size because they are sampled at the same length as the training sets. I then pair each unique test along with the bootstrapped training sets, and the validation sets created by their set difference and use this as my input series to the NN. For my response variables I create target data for the NN by combining the unique train sets (again not sure if this is proper procedure), with the unique test set, and then indexing my response time series with the set difference between the bootstrap samples and unique train sets of the input series to determine the elements of the response time series to include in the validation set for the target data. I then input both these series into the NN so it can fit my data.

I am quite confident that my mistake is theory based. I am not sure if I am handling the response variable correctly when a) partitioning my response variable according to the partitioning of my input series using CV, and b) when generating my input and target series to be input into the NN.

Has anyone ran into a similar problem before, or can anyone be so kind as to point me in the correct direction to fixing these poor NN results?

I greatly appreciate any help, and if code is needed do not hesitate to ask.


@Douglas Zare I do not alter the weights in any direct way in my code, I let the network randomly initialize and update the weights as it operates. The only architecture of the NNs I am altering is the amount of input and hidden nodes. I made a mistake above, I do not have upto 9 hidden layers, I only have one hidden layer, but with variable node sizes from 1-9. If I have not touched the weights in any form, could this still be the reason why my network is training poorly? I sincerely think I am making a procedure mistake when I am trying to create my target vector to feed the NN. My original time series is only 429 values, but with the 10-fold CV bootstrap technique, my input vector to the NN will always be larger due to including the variably sized validation set, as this is calculated as the set difference between a particular bootstrap resample of each 10-fold CV's training sets. So what I am doing to compensate for these additional data points, is indexing the elements in my input validation set, and then using that index in my target series to extract those elements and place in my validation set of my target values. I was always under the impression that you are never to alter your response variable, but I am unsure how I am suppose to create a valid target data set when the amount of data points in the data set exceeds the original time series.

Best Answer

Have a look the weights generated in your CV model to see are there any extreme big or small ones. If there are, you can play with learning rate to control the weight updating with a reasonable value range.

At mean time, try to plot out the learning curve to see have your NNs ever learned anything or learned and then overfitted.

Ensure your resampling index is well randomly generated and the targets always match with their associated sample.