Solved – Leave-one-subject-out cross validation in Caret

caretcross-validationrsampling

Hi Dear Colleagues,

I wonder how to correctly setup a leave-one-subject-out cross validation (LOSO) for train() function in caret.

Here is my example code:

dat <- as.data.frame(cbind(rnorm(1:500,1),rnorm(1:500,10),rnorm(1:500,5),
          rnorm(1:500,100),c(rep('1',100),rep('2',100),rep('3',100),rep('4',100),
                                rep('5',100)),rep(c(rep('X0',50),rep('X1',50)),5)))

colnames(dat) <- c('var1', 'var2', 'var3', 'var4', 'subject','class')

What would you then specify in trainControl() instead of cv (LOGCV?)? How would you modify the following code?

svmFit <- train(dat,y,
                method = "svmRadial",
                preProc = c("center", "scale"),
                tuneGrid = MySVMTuneGrid,
                trControl = trainControl(method = "cv", number = 10, classProbs =  TRUE))

Thank you very much for your time!

Best Answer

If there is one subject per row, then method = LOOCV would do it. You will have to setup your own resampling indicators and supply them via index. At that point, the value of method does't matter.

You could do something like:

subs <- unique(dat$subject)
model_these <- vector(mode = "list", length = length(subs))
for(i in seq_along(subs)) 
   model_these[[i]] <- which(dat$subject != subs[i])
names(model_these) <- paste0("Subject", subs)
svmFit <- train(class ~ var1 + var2 + var3 + var4,
                data = dat,
                method = "svmRadial",
                preProc = c("center", "scale"),
                tuneGrid = MySVMTuneGrid,
                trControl = trainControl(method = "cv", 
                                         index = model_these, 
                                         classProbs =  TRUE))

(note that your test data set converts var-var4 to character, so I didn't test this.)

Max