Solved – Lack of understanding of LOOCV

confusion matrixcross-validationk nearest neighbourmachine learningr

I am trying to utilize LOOCV in the data partition in R. The idea of LOOCV is to train the model on n-1 set and test the model on the only remaining one set. Then, is to repeat this process n times

Now suppose that I am dealing with KNN. That means on each repetition of LOOCV, I will get the Confusion Matrix to assess my model, which I want.

Since the dimension of my dataset is (569) (32). It means by using LOOCV, I will have 569 the confusion matrix as a total. The reason for calculating the confusion matrix is to find the number of correct class.

Am I right or I have something wrong?

If I am right. How to assess the model?

Best Answer

Each one of the 569 leave-one-out CV's will create 1 prediction, e.g. P(+) = 0.43. Then you need to apply a threshold to this probability value which will binarise it to 0 or 1. You then compare this binary prediction with the actual label, as a result of which your prediction for that fold will result in one of {TP, TN, FP, FN}. Since you have only one prediction per fold, constructing a confusion matrix does not sound intuitive.

You can calculate the mean accuracy, mean recall, and mean other confusion matrix based metrics by taking the average of all LOOCV results, but for the reasons explained above, constructing a confusion matrix per fold does not make much sense.

As for the merits of LOOCV and what to be aware of, I won't repeat what is given in this link, which I think is detailed.