Solved – Who invented k-fold cross-validation

cross-validationhistoryreferences

I am looking for a reference to the paper where k-fold cross-validation was introduced (rather than just a good academic reference for the subject). Perhaps it is too far back in the mists of time to unambiguously identify the very first paper, so any early papers where the idea was used would be of interest.

The earliest I am aware of are

P. A. Lachenbruch and M. R. Mickey, “Estimation of error rates in
discriminant analysis,” Technometrics, vol. 10, no. 1, pp. 1–12, Feb.
1968.

and

A. Luntz and V. Brailovsky, “On estimation of characters obtained
in statistical procedure of recognition (in Russian),” Techicheskaya
Kibernetica, vol. 3, 1969.

but as far as I can tell they only cover leave-one-out cross-validation (my technical Russian isn't all it could be ;o).

Best Answer

One paper that might be worth consulting is

Stone M. Cross-validatory choice and assessment of statistical predictions. J. Royal Stat. Soc., 36(2), 111–147, 1974.

I have seen references to

Mosteller F. and Tukey J.W. Data analysis, including statistics. In Handbook of Social Psychology. Addison-Wesley, Reading, MA, 1968.

as an early clear description of $k$-fold cross-validation, but I don't have this manuscript.

The 1931 paper

Larson S. The shrinkage of the coefficient of multiple correlation. J. Educat. Psychol., 22:45–55,1931.

is mentioned, e.g. by Stone, as an early example where a randomly selected validation set is put aside for later assessment of the model.

Related Question