I run caret
's recursive feature selection with randomForest. While running rfe
function with method repeatedcv, I had parameter maximize = TRUE. Thus, optimal set of variables is decided based on the best RMSE metrics.
However, I would like to see the minimum "tolerable" set of predictor variables without rerunning rfe with parameter maximize = FALSE. It takes 24 hours to rerun my analysis.
Appearently, caret's function pickSizeTolerance
does the trick, as is described on caret's webpage: http://caret.r-forge.r-project.org/featureselection.html
How to use the existing rfe object to get the "tolerable" set of variables?
Reproducible code:
library(caret)
inTrain <- createDataPartition(y = iris[,4],
p = .66,
list = FALSE)
training <- iris[ inTrain,]
testing <- iris[-inTrain,]
ctrl <- rfeControl(functions = rfFuncs, method = "repeatedcv", repeats = 5,
verbose = TRUE, returnResamp = "all")
rfProfile <- rfe(training[,-4], training[,4], sizes = c(2,3), rfeControl = ctrl, newdata = testing[,-4])
Object rfProfile$resample
includes all the metrics, but how to calculate?
Best Answer
Ok, functions
pickSizeTolerance
andpickSizeBest
are well documented incaret
's ?rfFuncsThe above written reproducible code can be further used following the documentation's example :