Cross-Validation – Hold-Out Validation vs. Cross-Validation

cross-validationmachine learningvalidation

To me, it seems that hold-out validation is useless. That is, splitting the original dataset into two-parts (training and testing) and using the testing score as a generalization measure, is somewhat useless.

K-fold cross-validation seems to give better approximations of generalization (as it trains and tests on every point). So, why would we use the standard hold-out validation? Or even talk about it?

Best Answer

NOTE: This answer is old, incomplete, and thoroughly out of date. Its was only debatably correct when it was posted in 2014, and I'm not really sure how it got so many upvotes or how it became the accepted answer. I recommend this answer instead, written by an expert in the field (and with significantly more upvotes). I am leaving my answer here for historical/archival purposes only.


My only guess is that you can Hold-Out with three hours of programming experience; the other takes a week in principle and six months in practice.

In principle it's simple, but writing code is tedious and time-consuming. As Linus Torvalds famously said, "Bad programmers worry about the code. Good programmers worry about data structures and their relationships." Many of the people doing statistics are bad programmers, through no fault of their own. Doing k-fold cross validation efficiently (and by that I mean, in a way that isn't horribly frustrating to debug and use more than once) in R requires a vague understanding of data structures, but data structures are generally skipped in "intro to statistical programming" tutorials. It's like the old person using the Internet for the first time. It's really not hard, it just takes an extra half hour or so to figure out the first time, but it's brand new and that makes it confusing, so it's easy to ignore.

You have questions like this: How to implement a hold-out validation in R. No offense intended, whatsoever, to the asker. But many people just are not code-literate. The fact that people are doing cross-validation at all is enough to make me happy.

It sounds silly and trivial, but this comes from personal experience, having been that guy and having worked with many people who were that guy.