Solved – How to compute F-measure and accuracy for repeated cross-validation

accuracybiasclassificationcross-validation

I am working on bug classification using classifiers but facing some confusion regarding methods to compute measures that capture the predictive power of classifiers. I am running repeated cross-validation, i.e., N x K-CV.
(In each of the N runs, the dataset is randomly distributed in the K bins.)

When N=1, the paper

Forman, George, and Martin Scholz. "Apples-to-apples in cross-validation studies: pitfalls in classifier performance measurement." ACM SIGKDD Explorations Newsletter 12.1 (2010): 49-57.

recommends, that we sum true positives (TP), false positive (FP) and false negatives (FP) over the folds and compute F-measure on these aggregates (see pp. 51 in above paper). For accuracy, it recommends that one should compute accuracy in each fold, sum them and finally divide the sum by K to get overall accuracy (pp. 52).

My query is, how do I calculate the overall F-measure and accuracy for N x K-CV?

  1. F-measure: Should I sum the quantities (i.e., TP, FP, FN) over the N x K runs and compute F-measure using these sums?

  2. For accuracy, should we sum the accuracy for each of the N x K runs and simply take their average for the overall estimate?

Any help is highly appreciated.

Best Answer

  1. F-measure: Should I sum the quantities (i.e., TP, FP, FN) over the N x K runs and compute F-measure using these sums?

Yes! Calculate one f1 for each run of cross-validation and average over the N runs. This is also a great opportunity to see the difference between this approach and calculating f1 for each fold and averaging over different folds differ from each other.

  1. For accuracy, should we sum the accuracy for each of the N x K runs and simply take their average for the overall estimate?

Also Yes! Good approach. In applications, it is sometimes not about being 100% correct but applying methods and techniques according to their ease of use.

However, whenever you are reporting the mean, please also report the variance or the standard deviation.

Related Question