Solved – How to select validation data when training a neural network

machine learningneural networksvalidation

I am training a neural network with time dependent financial data. In order to avoid overfitting I would like to stop the training at the point where my neural network stops improving on a set of validation data, different from the training data.

My question is how to best divide my data in training and validation data? And should validation data be randomly selected or be taken all from the end of the sample?

I am trying to predict stock prices based on a window of past prices, and if I randomly select data for validation could this lead to a information leakage?

Best Answer

If selection of validation data is difficult because of dependencies, then it might be better to consider using Bayesian regularisation instead of early stopping to avoid over-fitting as the validation set is not then required. Most neural network packages have an implementation of this procedure, for MATLAB I would recommend NETLAB.

Note however that the dependencies in the data may cause a problem of model mis-specification (as the standard Bayesian approach assumes the data are i.i.d.). So it is worth a try, but not guaranteed to work.

Related Question