Solved – Calculate LOO-AUC values using glmnet

aucglmnet

I have a matrix (x) containing 55 samples (rows) and 10000 independent variables (columns). The observations are binary, healthy or ill {0,1} (y). I want to perform leave one out cross-validation and determine the Area Under Curve (AUC) for each of the variables. To do so I need the nfold parameter to be equal to the number of observations (i.e..55). Am I right?

result=cv.glmnet(x,y,nfolds=55,type.measure="auc",family="binomial")

And I'm getting these warnings:

"Warning messages:
1: Too few (< 10) observations per fold for type.measure='auc' in   
cv.lognet; changed to type.measure='deviance'. Alternatively, use smaller  
value for nfolds 
2: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per  
fold"

What I'm doing wrong?

I want to get LOO-AUCs for each variable.

I'll really appreciate any help. Thank you

Best Answer

number of folds - default is 10. Although nfolds can be as large as the sample
size (leave-one-out CV), it is not recommended for large datasets. Smallest
value allowable is nfolds=3

From the package documenation it appears that you indeed can set nfolds equal to the sample size to perform leave-one-out CV.

However, the problem you are facing - as the error message indicates, is that, in order to calculate the AUC ( which really needs a way to rank your test cases) glmnet needs at least 10 obs.

Think about - if no. of test cases is only 1 how are you supposed to rank just one case?

This is only an issue because of the performance measure (auc) you have chosen. Other measures which do not require ranking i.e., those that can be sufficiently calculated using just on one test case ex: Mean squared error will not give you such an error you see.