Solved – How to calculate AUC with leave-one-out CV

auccross-validationout-of-sampleroc

In a binary response setting (data matrix D with N rows) I have performed LOOCV and obtained a final lambda*. The average CV error for this lambda* is also, as I understand it, an unbiased estimator for my out-of-sample error. I use this to train my final model using all the data. There are three pathways I can take to generate an ROC. Now I've only seen where we can estimate the error, not the ROC or AUC, in an unbiased manner. So I'm just not sure which ROC or AUC is the real out-of-sample character. Typical customers for the model have no intuition to translate out-of-sample error (deviance) into performance. They do, however, like to look at the ROC and AUC (AUC is just another summary statistic).

The first is to use the final model with all the data to get the scores. I think this may be the in-sample-ROC (and hence the in-sample AUC). This takes only N*M (N x number of lambda candidates) training events.

The second is to get the final model as I did above and go back to each LOO data set, say D(i), and retrain with the final model lambda* and calculate the score for x(i), the left out row. This is sort of an out-of-sample-ROC. This takes N*(M + 1)

The third is to get the final model as above and go back through the LOO data sets, D(i), and find the optimal lambda*(i) using something like K-fold CV and use this to train on D(i) and then calculate the score for x(i). This would seem to be yet another version of the out-of-sample ROC. This takes NMK training events.

I'm not sure what to call these three ROC curves or if there is another standard way to generate an in-sample ROC estimate and an out-of-sample ROC estimate.

Best Answer

The average CV error for this lambda* is also, as I understand it, an unbiased estimator for my out-of-sample error.

No. It is an optimistically biased estimate.

To get an unbiased* estimate of out-of-training error, you need to wrap your whole training procedure (including the optimization of λ) in another independent cross validation. See nested (aka double) cross validation.

Now there are 2 ways/approaches to get from nested cross validation to the final model.:

  1. Use the outer cross validation to check/ensure the λ* found for each of the outer surrogate models is the same (read: sufficiently similar) to take this as the λ* for the final model.
    This seems to be what you are planning. However, you'll then need to specify (beforehand!) what variability for λ* is acceptable.
  2. You can treat the inner cross validation (i.e. all you've done so far) as training method. You then say that the out-of-sample performance obtained by doing cross validation for this training method on the data set at hand is just what your outer cross validation loop measures.
    From that point of view, you'd run the λ* optimization again during training on the whole data set. (This is the point of view I prefer - it saves the difficulty what to do if λ* variability in the outer cross validation is just outside your specified target - while you can still evaluate and interpret this variability).
    From that point of view, you have now basically finished training the final model, and still need to do the measurement of out-of-sample error.

ROC /AuROC with cross validation

Again 2 possibilities:

  • You can always calculate separate ROCs for each of the cross valiation surrogate models. For LOO, however, they'd be too crude to be useful: only one point besides the (1/0) and (0/1) trivial points.
  • If your models are set up in a way that the predicted scores are on the same scale for all surrogate models, you can pool the predicted scores just like you pool the dichotomized predictions for calculating cross validation error. This gives you one ROC.

See my answer here for more details and a picture


The first is to use the final model with all the data to get the scores. I think this may be the in-sample-ROC (and hence the in-sample AUC). This takes only N*M (N x number of lambda candidates) training events.

Yes, that's a training error estimate.

The second is to get the final model as I did above and go back to each LOO data set, say D(i), and retrain with the final model lambda* and calculate the score for x(i), the left out row. This is sort of an out-of-sample-ROC. This takes N*(M + 1)

That's also a training error estimate (although it is a common error to mistake this for a generalization error): D(i) has been used to determine λ*, so it is not independent of the model.

The third is to get the final model as above and go back through the LOO data sets, D(i), and find the optimal lambda*(i) using something like K-fold CV and use this to train on D(i) and then calculate the score for x(i). This would seem to be yet another version of the out-of-sample ROC. This takes NMK training events.

If I got you correctly, that's the nested cross validation I've been talking about :-) => do this.

Note that the computation may be drastically reduced:

  • LOO doesn't pay well for the increased computational effort over k-fold cross validation (unless N is so small that anyways only 1 or 2 samples can be left out without totally changing the model)
    k-fold with k between maybe 5 and 10 is usually the way to go, if you can spend some more computation, it may be better spent on iterations/repetitions of k-fold in order to check stability - in particular as stability of the solution would be an important criterium when validating the optimization of λ.
  • You may want to have a closer look at LASSO λ optimization, e.g. Tibshirani and Taylor: The Solution Path of the Generalized Lasso (Disclaimer: haven't read or used that, it's just a hint I have in the back of my mind - so take with a grain of salt)

* slightly pessimistically biased

Related Question