Solved – K nearest neighbors with nested cross validation

cross-validationk nearest neighbourmodel selectionmodel-evaluation

I'm working on a binary classification problem on this dataset, using the k-nn algorithm.

For the performance evaluation and the parameter tuning (i.e. the choosing of k) I'm using the nested cross validation.

I split my dataset in 5 equal sized fold, and then I performed a cross validation for every training set/fold (i.e. I took from the fold the training set, on which I split it in 5 equal sized fold) for the k tuning. I've taken a specified set of values for the k tuning (1, 3, 5, 7, 9, 11, 13, etc)

For every nested cross validation I've taken the best k and I used it for the current fold evaluation.
I've drawn an example schema for explanation:

I got, for example, these results (cross validation with 5 fold):

First fold, best k = 11, accuracy = 0.785:
Second fold, best k = 11, accuracy = 0.776:
Third fold, best k = 11, accuracy = 0.786:
Fourth fold, best k = 11, accuracy = 0.791:
Fifth fold, best k = 9, accuracy = 0.793:

With an overall performance of 0.7853669 (accuracies mean)

Now, because I don't get for every fold the same best k (selection done with the inner cross validation), which k I need to use for my final model (the one which I will use for real classification)?

makes it sense to use the mean of the best k?
Or need I to do on all the dataset a inner cross validation for the final k selection? And saying that the expected performance will be the one evaluated with the nested cross validation?

Best Answer

I found out the response to my question

need I to do on all the dataset a inner cross validation for the final k selection? And saying that the expected performance will be the one evaluated with the nested cross validation?

short answer: yes.
long answer: The nested cross validation is needed to evaluate a process of learning and hyper parameters tuning, which means that at the end, if I want to select a k to use for my final model, I need to use my process of inner cross validation done on the different training sets obtained by the external cross validation split. The expected performance of this final model is what you evaluated with nested cross-validation earlier.

Best Answer

Related Solutions

Solved – Nested cross validation for model selection

update @user99889's question: What to do if outer CV finds instability?

Solved – Nested cross-validation – how is it different from model selection via kfold CV on the training set

Related Question