I have a matrix (x) containing 55 samples (rows) and 10000 independent variables (columns). The observations are binary, healthy or ill {0,1} (y). I want to perform leave one out cross-validation and determine the Area Under Curve (AUC) for each of the variables. To do so I need the nfold
parameter to be equal to the number of observations (i.e..55). Am I right?
result=cv.glmnet(x,y,nfolds=55,type.measure="auc",family="binomial")
And I'm getting these warnings:
"Warning messages:
1: Too few (< 10) observations per fold for type.measure='auc' in
cv.lognet; changed to type.measure='deviance'. Alternatively, use smaller
value for nfolds
2: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
fold"
What I'm doing wrong?
I want to get LOO-AUCs for each variable.
I'll really appreciate any help. Thank you
Best Answer
From the package documenation it appears that you indeed can set nfolds equal to the sample size to perform leave-one-out CV.
However, the problem you are facing - as the error message indicates, is that, in order to calculate the AUC ( which really needs a way to rank your test cases) glmnet needs at least 10 obs.
Think about - if no. of test cases is only 1 how are you supposed to rank just one case?
This is only an issue because of the performance measure (auc) you have chosen. Other measures which do not require ranking i.e., those that can be sufficiently calculated using just on one test case ex: Mean squared error will not give you such an error you see.