Solved – How to use rfe object with function pickSizeTolerance in R package caret

caretfeature selectionrms

I run caret's recursive feature selection with randomForest. While running rfe function with method repeatedcv, I had parameter maximize = TRUE. Thus, optimal set of variables is decided based on the best RMSE metrics.

However, I would like to see the minimum "tolerable" set of predictor variables without rerunning rfe with parameter maximize = FALSE. It takes 24 hours to rerun my analysis.

Appearently, caret's function pickSizeTolerance does the trick, as is described on caret's webpage: http://caret.r-forge.r-project.org/featureselection.html

How to use the existing rfe object to get the "tolerable" set of variables?

Reproducible code:

library(caret)
inTrain <- createDataPartition(y = iris[,4],
                           p = .66,
                           list = FALSE)
training <- iris[ inTrain,]
testing <- iris[-inTrain,]
ctrl <- rfeControl(functions = rfFuncs, method = "repeatedcv", repeats = 5,
               verbose = TRUE, returnResamp = "all")
rfProfile <- rfe(training[,-4], training[,4], sizes = c(2,3), rfeControl = ctrl, newdata = testing[,-4])

Object rfProfile$resample includes all the metrics, but how to calculate?

Best Answer

Ok, functions pickSizeTolerance and pickSizeBest are well documented in caret's ?rfFuncs

The above written reproducible code can be further used following the documentation's example :

example <- data.frame(RMSE =rfProfile$results$RMSE, Variables = c(2:4))

## Percent Loss in performance (positive)
example$PctLoss <- (example$RMSE - min(example$RMSE))/min(example$RMSE)*100

xyplot(RMSE ~ Variables, data= example)
xyplot(PctLoss ~ Variables, data= example)

absoluteBest <- pickSizeBest(example, metric = "RMSE", maximize = FALSE)
within5Pct <- pickSizeTolerance(example, metric = "RMSE", maximize = FALSE)

cat("numerically optimal:",
    example$RMSE[absoluteBest],
        "RMSE in position",
        absoluteBest, "\n")
cat("Accepting a 1.5 pct loss:",
    example$RMSE[within5Pct],
"RMSE in position",
within5Pct, "\n")