Solved – KNN classifier + cross validation

accuracyclassificationcross-validationk nearest neighbour

how can I find the mean and standard deviation of error rate or accuracy of a k- fold cross validation performing K-nearest-neighbour classification model for each fold?

Best Answer

The mean and standard deviation of you metrics are calculated across results of all cross validation (CV) partitions. So, if you have 10 CV partitions with 10 repeats you will obtain 100 sets of metrics, which in turn are used to compute the mean and standard deviation of each metric. This is not limited to KNN but applies do all models used with CV, therefore this should also answer your other question.

Assuming you are using a software like R: this is computed by the software already, so no need to do this on your own. For the purpose of understanding, here's a minimal working example on how to calculate it by hand anyway:

> library(caret)
> m <- train(iris[,1:4], 
>            iris[,5], 
>            method = 'knn', 
>            tuneGrid = expand.grid(k=1),
>            trControl=trainControl(method='repeatedcv', 
>                                   number=10, 
>                                   repeats=10))
> print(m)
    [...]
    Resampling results

    Accuracy  Kappa  Accuracy SD  Kappa SD
    0.96      0.94   0.0454       0.0682

> head(m$resample) # performances for individual partitions
    Accuracy Kappa     Resample
    1 0.9333333   0.9 Fold01.Rep01
    2 1.0000000   1.0 Fold02.Rep01
    3 1.0000000   1.0 Fold03.Rep01
    4 1.0000000   1.0 Fold04.Rep01
    5 0.9333333   0.9 Fold05.Rep01
    6 1.0000000   1.0 Fold06.Rep01
    [...]

> print(apply(m$resample[,1:2], MAR=2, mean)) # calculate mean/sd yourself
    Accuracy    Kappa 
        0.96     0.94

> print(apply(m$resample[,1:2], MAR=2, sd)) # calculate mean/sd yourself
    Accuracy      Kappa 
    0.04544332 0.06816498