How to compare accuracy with k-fold cross-validation with different k
I need to compare the accuracy
and F1
measures of my machine learning classifier (C1) with the state-of-the-art classifier (C2). However, the paper which proposed C2 had tested the classifier with 3-fold cross-validation test.
Is it possible to compare 10-fold CV on C1 with 3-fold CV on C2?
If yes, could you teach me how?
I am not sure about this since the larger k means more training set to be leaned and which might be unfair.
Best Answer
Cross validation is a technique to estimate the generalization error of a model. Comparing generalization error of two models M1 and M2 is certainly possible, and not restricted to $k$-fold cross validation (nor to equal $k$).
From your post it is not entirely clear whether you actually want to compare models M1 and M2 or training algorithms A1 and A2.
Comparing training algorithms via cross validation makes stronger assumptions than comparing predictive performance of specific (fully trained) models. Also, resampling validation (to which cross validation belongs) cannot fully measure the variance uncertainty for algorithm comparison - usually leading to the additional implicit assumption that this doesn't matter...
The most powerful way to such comparisons are so-called paired designs (read up on paired design of experiments), where you compare the performance of both models (or algorithms) on exactly the same data: that way, you are sure that both models (or algorihms) have to deal with problems of exactly the same difficulty.
In order to do that, you need to have access either to the exact data (including cross validation splits) of the reference classifier M2 and then use the same to train and test your model M1. If comparing algorithms, you can also work with a reference implementation of A2 and then train models M2i using A2 models M1i by your algorithm A1 both on the same splits (and many of those).
You can then compare the predictions pairwise: same test case, M1 prediction vs. M2 prediction.
If that is not possible, because all you really have is the publication about the reference classiifier, the uncertainty in the comparison is larger. I.e., the "I don't know whether my classifier is better" zone is wider.
Bias and variance
Classifier testing like any other measurement is subject to systematic (bias) and random error (variance).
Correctly implemented k-fold cross validation has a small pessimistic bias. This bias may be different for your 10-fold and the published 3-fold CV results, but it is hard to say anything further:
This hinges on cross validation splits being statistically independent: if you have groups in your data read up on independent splitting and confounding variables.
Actually comparing the two classifiers
Depending on your sample size, a quick look at an approximate confidence interval may already tell you all you need.