Caret Package in R – Understanding Outer Cross-Validation Cycle

caretcross-validationpredictionr

Could somebody provide a nice example code how to best implement an outer crossvalidation cycle using the caret package in R? The package provides a convenient trainControl() argument to ajust the inner crossvalidation. However I would like to embed this into multiple outer crossvalidation cycles to get a more stable estimate of the prediction performance of the estimated models!

Best Answer

Inner and outer CV are used to perform classifier selection not to get a better prediction on the estimate. To get a better estimate, do a repeated cv. So to perform a 10-repeates 5-fold CV use

trainControl(method = "repeatedcv",number = 5,
             ## repeated ten times
             repeats = 10)

But if what you really want is a nested CV, for example to select between a random forest or a svm) then as far as know you have to do the outer CV explicitly. What I did for an outer 5-fold, inner 10-fold was:

ntrain=length(ytrain)    
train.ext=createFolds(ytrain,k=5,returnTrain=TRUE)
test.ext=lapply(train.ext,function(x) (1:ntrain)[-x])

for (i in 1:5){
    model<-train(Class ~ ., data = training[train.ext[[i]]],
                 trControl=trainControl(method = "cv",number = 10),
                 ...
    ...
    }

Related Solutions

Solved – Number of principal components when preprocessing using PCA in caret package in R

By default, caret keeps the components that explain 95% of the variance.
But you can change it by using the thresh parameter.

# Example
preProcess(training, method = "pca", thresh = 0.8)

You can also set a particular number of components by setting the pcaComp parameter.

# Example
preProcess(training, method = "pca", pcaComp = 7)

If you use both parameters, pcaComp has precedence over thresh.

Please see: https://www.rdocumentation.org/packages/caret/versions/6.0-77/topics/preProcess

Best Answer

Related Solutions

Solved – Number of principal components when preprocessing using PCA in caret package in R

Related Question