Solved – 10-fold Cross-validation vs leave-one-out cross-validation

cross-validationmachine learning

I'm doing nested cross-validation. I have read that leave-one-out cross-validation can be biased (don't remember why).

Is it better to use 10-fold cross-validation or leave-one-out cross-validation apart from the longer runtime for leave-one-out cross-validation?

Best Answer

Just to add slightly to the answer of @SubravetiSuraj (+1)

Cross-validation gives a pessimistically biased estimate of performance because most statistical models will improve if the training set is made larger. This means that k-fold cross-validation estimates the performance of a model trained on a dataset $100\times\frac{(k-1)}{k}\%$ of the available data, rather than on 100% of it. So if you perform cross-validation to estimate performance, and then use a model trained on all of the data for operational use, it will perform slightly better than the cross-validation estimate suggests.

Leave-one-out cross-validation is approximately unbiased, because the difference in size between the training set used in each fold and the entire dataset is only a single pattern. There is a paper on this by Luntz and Brailovsky (in Russian).

Luntz, Aleksandr, and Viktor Brailovsky. "On estimation of characters obtained in statistical procedure of recognition." Technicheskaya Kibernetica 3.6 (1969): 6–12.

see also

Lachenbruch,Peter A., and Mickey, M. Ray. "Estimation of Error Rates in Discriminant Analysis." Technometrics 10.1 (1968): 1–11.

However, while leave-one-out cross-validation is approximately unbiased, it tends to have a high variance (so you would get very different estimates if you repeated the estimate with different initial samples of data from the same distribution). As the error of the estimator is a combination of bias and variance, whether leave-one-out cross-validation is better than 10-fold cross-validation depends on both quantities.

Now the variance in fitting the model tends to be higher if it is fitted to a small dataset (as it is more sensitive to any noise/sampling artifacts in the particular training sample used). This means that 10-fold cross-validation is likely to have a high variance (as well as a higher bias) if you only have a limited amount of data, as the size of the training set will be smaller than for LOOCV. So k-fold cross-validation can have variance issues as well, but for a different reason. This is why LOOCV is often better when the size of the dataset is small.

However, the main reason for using LOOCV in my opinion is that it is computationally inexpensive for some models (such as linear regression, most kernel methods, nearest-neighbour classifiers, etc.), and unless the dataset were very small, I would use 10-fold cross-validation if it fitted in my computational budget, or better still, bootstrap estimation and bagging.