Solved – When are Shao’s results on leave-one-out cross-validation applicable

classificationcross-validationmodel selection

In his paper Linear Model Selection by Cross-Validation, Jun Shao shows that for the problem of variable selection in multivariate linear regression, the method of leave-one-out cross validation (LOOCV) is 'asymptotically inconsistent'. In plain English, it tends to select models with too many variables. In a simulation study, Shao shows that even for as few as 40 observations, LOOCV can underperform other cross-validation techniques.

This paper is somewhat controversial, and somewhat ignored (10 years after its publication, my chemometrics colleagues had never heard of it and were happily using LOOCV for variable selection…). There is also a belief (I am guilty of this), that its results extend somewhat beyond the original limited scope.

The question, then: how far do these results extend? Are they applicable to the following problems?

  1. Variable selection for logistic regression/GLM?
  2. Variable selection for Fisher LDA classification?
  3. Variable selection using SVM with finite (or infinite) kernel space?
  4. Comparison of models in classification, say SVM using different kernels?
  5. Comparison of models in linear regression, say comparing MLR to Ridge Regression?
  6. etc.

Best Answer

You need to specify the purpose of the model before you can say whether Shao's results are applicable. For example, if the purpose is prediction, then LOOCV makes good sense and the inconsistency of variable selection is not a problem. On the other hand, if the purpose is to identify the important variables and explain how they affect the response variable, then Shao's results are obviously important and LOOCV is not appropriate.

The AIC is asymptotically LOOCV and BIC is asymptotically equivalent to a leave-$v$-out CV where $v=n[1-1/(\log(n)-1)]$ --- the BIC result for linear models only. So the BIC gives consistent model selection. Therefore a short-hand summary of Shao's result is that AIC is useful for prediction but BIC is useful for explanation.