Solved – Double (nested,wrapper) CrossValidation – final trained model

cross-validation

I'm performing a study where I'm selecting kernel type and hyperparameters in an inner CV loop and an outer loop doing 10-fold CV (using SVR). The output is 10 trained models and performance measures.

My question is where do I go from here. When I train a new model with the complete dataset using the selected kernel (by either the hyperparameters that gave the min error during the 10-fold CV or finding the optimal ones with the selected kernel for the complete dataset) the final model I end up with is not validated against training data. Is it reasonable to do this and use the average error previously obtained from 10-fold CV as an "informal" performance estimate since the model is trained on a slightly larger dataset? How would I word this in a journal paper? My thesis advisor is questioning it for one.

Best Answer

It sounds like you're taking the correct approach - you'll want to do a nested CV so that you tune your parameters on the inner dataset, and then estimate the error on a holdout set that the model has never seen before. As an example:

Divide your training set into 10 folds. Use the 9 folds to tune your model (again through CV), and then estimate the error on the 10th fold that you held out. You can do this 10 times to get an estimate of the error (you could actually do it another set of 10 times if you randomly generated a different set of folds).

The Elements of Statistical Learning explicitly warns against "[Using] cross-validation to estimate the unknown tuning parameters and to estimate the prediction error of the final model." - emphasis mine (see Ch 7, Section 10.2). You can of course use CV to estimate these separately.

If you need another a citation for the importance of this, some researchers at Google just released a paper related to this. If you have access to the paper on Science, they even released their Python code along with the article.