Solved – Compendium of cross-validation techniques

cross-validation

I'm wondering if anybody knows of a compendium of cross-validation techniques with a discussion of the differences between them and a guide on when to use each of them. Wikipedia has a list of the most common techniques, but I'm curious if there are other techniques, and if there are any taxonomies for them.

For example, I just run into a library that allows me to choose one of the following strategies:

  • Hold out
  • Bootstrap
  • K Cross-validation
  • Leave one out
  • Stratified Cross Validation
  • Balanced Stratified Cross Validation
  • Stratified Hold out
  • Stratified Bootstrap

and I am trying to understand what stratified and balanced mean in bootstrapping, hold out or CV.

We can also turn this post into a community wiki if people want ,and collect a discussion of techniques or taxonomies here.

Best Answer

You can add to that list:

  • Repeated-cross validation
  • Leave-group-out cross-validation
  • Out-of-bag (for random forests and other bagged models)
  • The 632+ bootstrap

I don't really have a lot of advice as far as how to use these techniques or when to use them. You can use the caret package in R to compare CV, Boot, Boot632, leave-one-out, leave-group-out, and out-of-bag cross-validation.

In general, I usually use the boostrap because it is less computationally intensive than repeated k-fold CV, or leave-one-out CV. Boot632 is my algorithm of choice because it doesn't require much more computation than the bootstrap, and has show to be better than cross-validation or the basic bootstap in certain situations.

I almost always use out-of-bag error estimates for random forests, rather than cross-validation. Out-of-bag errors are generally unbiased, and random forests take long enough to compute as it is.