Solved – Is testing on test set after hyper parameter tuning (with crossvalidation) necessary

cross-validationhyperparametermachine learningpython

My problem: I have some rule based algorithms and various machine learning algorithms (random forest, boosting, …) I want to compare for a specific use case.

Since I want to optimize the hyper parameters for my classifiers I think I need to split my (small) dataset in 3 partitions:

  1. Training data
  2. Validation data
  3. Test data

I perform hyper parameter tuning on the training data and validate on validation data. After I found the "best" parameters I train the model with the best parameters on my training data and test on my test data.

I could do this for all my algorithms (rule based won't need training, I'll just test them on the test set) and have in my opinion comparable results.

For parameter tuning I want to use GridSearchCV and/or RandomizedSearchCV which both validate using cross validation with the specified amount of folds. In this case I would not need the validation data set and purely use the training set for parameter tuning.

In the end I test again on my test set. And here is my problem – Is it really necessary? GridSearchCV for example will do cross validation for all permutations of parameters I set and come up with a mean accuracy or something else. Is the mean accuracy of this cross validation not a meaningful metric for future predictions? I don't see the point of testing again on the test set.

I'm a bit concerned because my dataset is relatively small (3000 datapoints on 12 classes with 14 features, imbalanced), so if I just test on the test set (without cross validation) there might be noise?

If someone has tips for a general approach I should pursue, thanks!

Best Answer

Yes, as annoying it can be, it is really important to test your best model on the test data.

It is the same thing as when you train on the training data: you won't validate on the same data. There, your hyperparameter tuning is part of the training, so that you won't test on the data you have used to train your hyper-parameters, namely training and validation data.