Solved – Training error less than validation error, but higher than test error

machine learningmodel-evaluationregressiontime seriesvalidation

I have a time series regression prediction problem. So I divided the dataset into 3 parts:

  • training (first 70% of the time series data)
  • validation (from 70% to 85% of the time series data)
  • test set (last 15% of the data)

Then trained the model for some epochs and used earlystopping callback (keras callback) on validation set. By using earlystopping, I will be able to stop the model from further training, if no improvement is detected on the validation set.

Then calculated errors of the predictions on each dataset. Here are the results:

  1. Training dataset Mean-Squared-Error = 921.4
  2. Validation dataset Mean-Squared-Error = 1200.2
  3. Test dataset Mean-Squared-Error = 300

From this question I concluded that my model is acting normally, because Training error is not higher than validation error.

I know that my Test dataset is easier than training and validation sets to predict and this is the reason for lower error. Does my model have problems? Should always test and validation errors be more than train errors? Is my model good at generalization?

Best Answer

One of the best ways to check if the model is doing good is to build a graph between the training error and validation error. Ideally it should look something like this. enter image description here

Graph from LSTM based stock price prediction for Mitsubishi Stock (Nikkei Index) Kernel location

As you can see, that validation error is not really improving with progressing epochs, with the minimum validation error at (iteration x).

In case your training/validation graphs look something like this, then I guess the model is doing good.

Additional Question:

  1. Looking at your values of train/val error, it seems these are inverse scaled errors that you calculated manually after model learning was completed, correct? Else if these values are actual error metrics given out by modelling technique, then I would recommend scaling the data before feeding it to the network.