Solved – Should I Choose the best model based on test error or validation error

cross-validationkerasmodel selectionmodel-evaluationneural networks

I divided my dataset to training, validation and test sets. Then trained multiple forecasting models on the training dataset. now I have 3 errors for each model:

  1. Training error
  2. Validation error
  3. Test error

My question is that which model is the best? The model with the lowest error on predicting the validation set or the test set?

I should also mention that, I used validation set to evaluate training progress and also used earlystopping callback ( a keras predefined callback) on validation set. So, When a model stops progressing on the validation set, it will stop training.

Best Answer

Model training is done over multiple epochs. With every epoch, the model learns using training data and then performs prediction over validation data. Ideally, with passing EPOCHS your training accuracy would increase, and simultaneously your validation accuracy would also increase (If accuracy is the metrics you are using to train your model). We can also say that Training Error would reduce and simultaneously validation error would also reduce.

Eventually, after certain EPOCHS, your training accuracy continues to increase, but your validation accuracy would start decreasing. This is the point where your callback/earlystopping interface would trigger to stop the learning process. This is done because, if the training accuracy continues to increase while validation accuracy stagnates/reduces, your model is entering an overlearning phase (mugging up the training data). Subsequently, you would work on regularizing your model (tuning model parameters) to avoid such stagnation and/or degradation of validation error after certain epochs.

So, lowest error should be considered that of validation error. This is when you finalize your model.

Finally, you test you model on test data to see how close is the test error as compared to your validation error

This may come out to be as following

  • Validation Error and Test Error are almost same

    Your model is doing good and it can be said that it has generalized to a fair level

  • Validation Error and Test Error are very different

    Either your test data and validation data are not IDD or train/valid/test splitting mechanism wasnt that appropriate. This could also point to the fact that you model needs better tuning for further generalization.

  • Validation Error and Test Error are somewhat same

    This is more of judgement call now. Based on your application/project requirement, you can consider re-training or finalizing your model.

Related Question