You can add to that list:
- Repeated-cross validation
- Leave-group-out cross-validation
- Out-of-bag (for random forests and other bagged models)
- The 632+ bootstrap
I don't really have a lot of advice as far as how to use these techniques or when to use them. You can use the caret package in R to compare CV, Boot, Boot632, leave-one-out, leave-group-out, and out-of-bag cross-validation.
In general, I usually use the boostrap because it is less computationally intensive than repeated k-fold CV, or leave-one-out CV. Boot632 is my algorithm of choice because it doesn't require much more computation than the bootstrap, and has show to be better than cross-validation or the basic bootstap in certain situations.
I almost always use out-of-bag error estimates for random forests, rather than cross-validation. Out-of-bag errors are generally unbiased, and random forests take long enough to compute as it is.
It comes down to variance and bias (as usual). CV tends to be less biased but K-fold CV has fairly large variance. On the other hand, bootstrapping tends to drastically reduce the variance but gives more biased results (they tend to be pessimistic). Other bootstrapping methods have been adapted to deal with the bootstrap bias (such as the 632 and 632+ rules).
Two other approaches would be "Monte Carlo CV" aka "leave-group-out CV" which does many random splits of the data (sort of like mini-training and test splits). Variance is very low for this method and the bias isn't too bad if the percentage of data in the hold-out is low. Also, repeated CV does K-fold several times and averages the results similar to regular K-fold. I'm most partial to this since it keeps the low bias and reduces the variance.
Edit
For large sample sizes, the variance issues become less important and the computational part is more of an issues. I still would stick by repeated CV for small and large sample sizes.
Some relevant research is below (esp Kim and Molinaro).
References
Bengio, Y., & Grandvalet, Y. (2005). Bias in estimating the variance of k-fold cross-validation. Statistical modeling and analysis for complex data problems, 75–95.
Braga-Neto, U. M. (2004). Is cross-validation valid for small-sample microarray classification Bioinformatics, 20(3), 374–380. doi:10.1093/bioinformatics/btg419
Efron, B. (1983). Estimating the error rate of a prediction rule: improvement on cross-validation. Journal of the American Statistical Association, 316–331.
Efron, B., & Tibshirani, R. (1997). Improvements on cross-validation: The. 632+ bootstrap method. Journal of the American Statistical Association, 548–560.
Furlanello, C., Merler, S., Chemini, C., & Rizzoli, A. (1997). An application of the bootstrap 632+ rule to ecological data. WIRN 97.
Jiang, W., & Simon, R. (2007). A comparison of bootstrap methods and an adjusted bootstrap approach for estimating the prediction error in microarray classification. Statistics in
Medicine, 26(29), 5320–5334.
Jonathan, P., Krzanowski, W., & McCarthy, W. (2000). On the use of cross-validation to assess performance in multivariate prediction. Statistics and Computing, 10(3), 209–229.
Kim, J.-H. (2009). Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Computational Statistics and Data Analysis, 53(11), 3735–3745. doi:10.1016/j.csda.2009.04.009
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. International Joint Conference on Artificial Intelligence, 14, 1137–1145.
Martin, J., & Hirschberg, D. (1996). Small sample statistics for classification error rates I: Error rate measurements.
Molinaro, A. M. (2005). Prediction error estimation: a comparison of resampling methods. Bioinformatics, 21(15), 3301–3307. doi:10.1093/bioinformatics/bti499
Sauerbrei, W., & Schumacher1, M. (2000). Bootstrap and Cross-Validation to Assess Complexity of Data-Driven Regression Models. Medical Data Analysis, 26–28.
Tibshirani, RJ, & Tibshirani, R. (2009). A bias correction for the minimum error rate in cross-validation. Arxiv preprint arXiv:0908.2904.
Best Answer
The AUC is equivalent to the c-index or concordance, the fraction of pairs of cases in which the ordering of the predictor value (based on the combination of predictor variables) is consistent with differences in outcome. So in principle if you wanted to do cross-validation you could use the paired comparisons of the held-out cases to calculate a concordance index (and thus an AUC) for each fold of CV.
But to get enough comparisons to be useful you would probably have to do not just one CV but multiple repeated cross validations with different subsetting of the cases. The many iterations of CV may offset your initial idea that fewer computing resources are required for CV than for bootstrapping. I find the bootstrap more straightforward and have used it for estimating standard errors of AUC values. If you only have on the order of 100 observations there shouldn't be that large a computational cost, so check the bootstrap algorithm you are using.
My hesitation is that although AUC might be considered a "neutral" evaluation metric, classifiers are typically used in a situation where there are different costs to false-positive and false-negative determinations. It's not clear that you would necessarily get the same result by "maximizing AUC directly" as you would by evaluating the cost-benefit tradeoffs in the context of how you plan to use your results. And your use of the word "maximizing" suggests that you might be trying to use AUC to compare among models, which might be better done with a different measure like the Akaike Information Criterion; see the accepted answer on this page and its comments.