Solved – Do we need a test set when using k-fold cross-validation

cross-validationout-of-samplevalidation

I've been reading about k-fold validation, and I want to make sure I understand how it works.

I know that for the holdout method, the data is split into three sets, and the test set is only used at the very end to assess the performance of the model, while the validation set is used for tuning hyperparameters, etc.

In the k-fold method, do we still hold out a test set for the very end, and only use the remaining data for training and hyperparameter tuning, i.e. we split the remaining data into k folds, and then use the average accuracy after training with each fold (or whatever performance metric we choose to tune our hyperparameters)? Or do we not use a separate test set at all, and simply split the entire dataset into k folds (if this is the case, I assume that we just consider the average accuracy on the k folds to be our final accuracy)?

Best Answer

In the K-Fold method, do we still hold out a test set for the very end, and only use the remaining data for training and hyperparameter tuning (ie. we split the remaining data into k folds, and then use the average accuracy after training with each fold (or whatever performance metric we choose) to tune our hyperparameters)?

Yes. As a rule, the test set should never be used to change your model (e.g., its hyperparameters).

However, cross-validation can sometimes be used for purposes other than hyperparameter tuning, e.g. determining to what extent the train/test split impacts the results.

Related Question