Leave-One-Out Cross-Validation – How It Works and Selecting the Final Model

cross-validation

I have some data and I want to build a model (say a linear regression model) out of this data. In a next step, I want to apply Leave-One-Out Cross-Validation (LOOCV) on the model so see how good it performs.

If I understood LOOCV right, I build a new model for each of my samples (the test set) using every sample except this sample (the training set). Then I use the model to predict the test set and calculate the errors $(\text{predicted} – \text{actual})$.

In a next step I aggregate all the errors generated using a chosen function, for example mean squared error. I can use these values to judge on the quality (or goodness of fit) of the model.

Question: Which model is the model these quality-values apply for, so which model should I choose if I find the metrics generated from LOOCV appropriate for my case? LOOCV looked at $n$ different models (where $n$ is the sample size); which one is the model I should choose?

  • Is it the model which uses all the samples? This model was never calculated during the LOOCV process!
  • Is it the model which has the least error?

Best Answer

It is best to think of cross-validation as a way of estimating the generalisation performance of models generated by a particular procedure, rather than of the model itself. Leave-one-out cross-validation is essentially an estimate of the generalisation performance of a model trained on $n-1$ samples of data, which is generally a slightly pessimistic estimate of the performance of a model trained on $n$ samples.

Rather than choosing one model, the thing to do is to fit the model to all of the data, and use LOO-CV to provide a slightly conservative estimate of the performance of that model.

Note however that LOOCV has a high variance (the value you will get varies a lot if you use a different random sample of data) which often makes it a bad choice of estimator for performance evaluation, even though it is approximately unbiased. I use it all the time for model selection, but really only because it is cheap (almost free for the kernel models I am working on).

Related Question