Solved – Differences between cross validation and bootstrapping to estimate the prediction error

bootstrapcross-validationpredictive-models

I would like your thoughts about the differences between cross validation and bootstrapping to estimate the prediction error.

Does one work better for small dataset sizes or large datasets?

Best Answer

It comes down to variance and bias (as usual). CV tends to be less biased but K-fold CV has fairly large variance. On the other hand, bootstrapping tends to drastically reduce the variance but gives more biased results (they tend to be pessimistic). Other bootstrapping methods have been adapted to deal with the bootstrap bias (such as the 632 and 632+ rules).

Two other approaches would be "Monte Carlo CV" aka "leave-group-out CV" which does many random splits of the data (sort of like mini-training and test splits). Variance is very low for this method and the bias isn't too bad if the percentage of data in the hold-out is low. Also, repeated CV does K-fold several times and averages the results similar to regular K-fold. I'm most partial to this since it keeps the low bias and reduces the variance.

Edit

For large sample sizes, the variance issues become less important and the computational part is more of an issues. I still would stick by repeated CV for small and large sample sizes.

Some relevant research is below (esp Kim and Molinaro).

References

Bengio, Y., & Grandvalet, Y. (2005). Bias in estimating the variance of k-fold cross-validation. Statistical modeling and analysis for complex data problems, 75–95.

Braga-Neto, U. M. (2004). Is cross-validation valid for small-sample microarray classification Bioinformatics, 20(3), 374–380. doi:10.1093/bioinformatics/btg419

Efron, B. (1983). Estimating the error rate of a prediction rule: improvement on cross-validation. Journal of the American Statistical Association, 316–331.

Efron, B., & Tibshirani, R. (1997). Improvements on cross-validation: The. 632+ bootstrap method. Journal of the American Statistical Association, 548–560.

Furlanello, C., Merler, S., Chemini, C., & Rizzoli, A. (1997). An application of the bootstrap 632+ rule to ecological data. WIRN 97.

Jiang, W., & Simon, R. (2007). A comparison of bootstrap methods and an adjusted bootstrap approach for estimating the prediction error in microarray classification. Statistics in Medicine, 26(29), 5320–5334.

Jonathan, P., Krzanowski, W., & McCarthy, W. (2000). On the use of cross-validation to assess performance in multivariate prediction. Statistics and Computing, 10(3), 209–229.

Kim, J.-H. (2009). Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Computational Statistics and Data Analysis, 53(11), 3735–3745. doi:10.1016/j.csda.2009.04.009

Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. International Joint Conference on Artificial Intelligence, 14, 1137–1145.

Martin, J., & Hirschberg, D. (1996). Small sample statistics for classification error rates I: Error rate measurements.

Molinaro, A. M. (2005). Prediction error estimation: a comparison of resampling methods. Bioinformatics, 21(15), 3301–3307. doi:10.1093/bioinformatics/bti499

Sauerbrei, W., & Schumacher1, M. (2000). Bootstrap and Cross-Validation to Assess Complexity of Data-Driven Regression Models. Medical Data Analysis, 26–28.

Tibshirani, RJ, & Tibshirani, R. (2009). A bias correction for the minimum error rate in cross-validation. Arxiv preprint arXiv:0908.2904.