Solved – CV for model parameter tuning AND then model evaluation

cross-validationparameterizationrandom forest

I have a basic question on using cross-validation for model parameter tuning (model training) and model evaluation (testing) similar to this Model Tuning and Model Evaluation in Machine Learning

I understand that it is suggested to only use the training set (test set remain 'unseen') to tune the model parameter ('mtry', I am using Random Forest (RF)) i.e. training set is split further into training and validation set to do k-fold cross validation to obtain optimum parameter value.

However, I am confused if I then wish to do k-fold cross validation to evaluate the model accuracy (to test the trained model on different test sets sampled from the whole dataset). Is the right model evaluation procedure is:

(1) Simply rerun RF, with the parameter 'mtry' tuned by CV only using training set, to different training-test set partitions? Although only 1 (one) realization/partition of training set is used to tune 'mtry' at the beginning? OR should I tune 'mtry' using different training set realizations to begin with?

(2) Run RF with the tuned 'mtry' on different bootstrap samples from the 1 (one) realization of test set (at the beginning) not used to tune 'mtry'?

Thank you and sorry if my writing is a bit confusing.

Best Answer

The simple rule is that data used for evaluating the performance of a model should not have been used to optimize the model in any way. If you split all of the available data into k disjoint subsets to use to tune the hyper-parameters of a model (e.g. the kernel and regularization parameters of an SVM), then you cannot perform unbiased performance estimation as all of the data has influenced the selection of the hyper-parameters. This means that both (1) and (2) are likely to be optimistically biased.

The solution is to use nested cross-validation, where the outer cross-validation is used for performance evaluation. The key point is that we want to estimate the performance of the whole procedure for fitting the model, which includes tuning the hyper-parameters. So you need to include in each fold of the outer cross-validation all of the steps used to tune the model, which in this case includes using cross-validation to tune the hyper-parameters independently in each fold. I wrote a paper on this topic, which you can find here, section 5.3 gives an example of why performing cross-validation for both model selection and performance evaluation is a bad idea.