Caret Package in R – Understanding Outer Cross-Validation Cycle

caretcross-validationpredictionr

Could somebody provide a nice example code how to best implement an outer crossvalidation cycle using the caret package in R? The package provides a convenient trainControl() argument to ajust the inner crossvalidation. However I would like to embed this into multiple outer crossvalidation cycles to get a more stable estimate of the prediction performance of the estimated models!

Best Answer

Inner and outer CV are used to perform classifier selection not to get a better prediction on the estimate. To get a better estimate, do a repeated cv. So to perform a 10-repeates 5-fold CV use

trainControl(method = "repeatedcv",number = 5,
             ## repeated ten times
             repeats = 10)

But if what you really want is a nested CV, for example to select between a random forest or a svm) then as far as know you have to do the outer CV explicitly. What I did for an outer 5-fold, inner 10-fold was:

ntrain=length(ytrain)    
train.ext=createFolds(ytrain,k=5,returnTrain=TRUE)
test.ext=lapply(train.ext,function(x) (1:ntrain)[-x])

for (i in 1:5){
    model<-train(Class ~ ., data = training[train.ext[[i]]],
                 trControl=trainControl(method = "cv",number = 10),
                 ...
    ...
    }
Related Question