Solved – Cross-validation misuse (reporting performance for the best hyperparameter value)

cross-validationmodel selectionmodel-evaluationreferences

Recently I have come across a paper that proposes using a k-NN classifier on an specific dataset. The authors used all the data samples available to perform k-fold cross validation for different k values and report cross validation results of the best hyperparameter configuration.

To my knowledge, this result is biased, and they should have retained a separate test set to obtain an accuracy estimate on samples not used to perform hyperparameter optimization.

Am I right? Can you provide some references (preferably research papers) that describe this misuse of cross validation?

Best Answer

Yes, there are issues with reporting only k-fold CV results. You could use e.g. the following three publications for your purpose (though there are more out there, of course) to point people towards the right direction:

I personally like those because they try to state the issues more in plain English than in Math.