I am computing a SVM-RFE model with the rfe
function of the caret
package, but I am a bit confused about the results. My code is:
fiveStats <- function(...) c(twoClassSummary(...), defaultSummary(...))
svmFuncs <- caretFuncs
svmFuncs$summary <- fiveStats
set.seed(345)
FSctrl <- rfeControl(method = "repeatedcv",
repeats = 5,
verbose = TRUE,
functions = svmFuncs,
index = createMultiFolds(TrData[, 1], times = 5),
saveDetails = TRUE)
TRctrl = trainControl(method = "LGOCV",
number = 50, p = 0.7,
savePredictions = TRUE,
classProbs = TRUE,
verboseIter = FALSE)
set.seed(921)
svmRFE_NG <- rfe(x = TrData[, 2:43],
y = TrData[, 1],
sizes = seq(1,42),
metric = "ROC",
rfeControl = FSctrl,
## Options to train()
method = "svmLinear",
tuneGrid = expand.grid(C = 10.^(-2:2)),
preProc = c("center", "scale"),
## Inner resampling process
trControl = TRctrl)
I would like to compute some average metrics (ROC curve, AUC, sensitivity…) from the cross-validation data (training), but I am not sure where to look at:
svmRFE_NG$pred
:
> head(svmRFE_NG$pred)
pred BREAST LUNG obs Variables Resample rowIndex
predictions.1 LUNG 0.3075494 0.6924506 LUNG 42 Fold01.Rep1 33
predictions.2 LUNG 0.1106591 0.8893409 LUNG 42 Fold01.Rep1 37
predictions.3 LUNG 0.2504079 0.7495921 BREAST 42 Fold01.Rep1 41
predictions.4 LUNG 0.1174505 0.8825495 LUNG 42 Fold01.Rep1 44
predictions.5 LUNG 0.1238329 0.8761671 BREAST 42 Fold01.Rep1 46
predictions.6 LUNG 0.2917743 0.7082257 LUNG 41 Fold01.Rep1 33
or svmRFE_NG$fit$pred
:
> head(svmRFE_NG$fit$pred)
pred obs BREAST LUNG rowIndex C Resample
1 BREAST BREAST 0.7434318 0.2565682 4 0.01 Resample01
2 LUNG LUNG 0.2731751 0.7268249 6 0.01 Resample01
3 LUNG BREAST 0.4431675 0.5568325 8 0.01 Resample01
4 BREAST BREAST 0.8306861 0.1693139 11 0.01 Resample01
5 BREAST BREAST 0.8404291 0.1595709 15 0.01 Resample01
6 LUNG LUNG 0.3936469 0.6063531 19 0.01 Resample01
To my knowledge, the final model is stored in svmRFE_NG$fit
. Should I take these results (for C = best tuning parameter) or should I work with the svmRFE_NG$pred
results (for Variables = optimal size)?
Best Answer
From looking at the RFE examples at Max's page,
svmRFE_NG$resample
andsvmRFE_NG$pred$Resample
(and their counterparts insvmRFE_NG$fit
), I'd say this depends on which characteristics you want to look at.svmRFE_NG
seems to contain cross validation results of using different variables, so could be used for statistics about using different variables (consider e.g.svmRFE_NG$variables
too). Not all information seems to be preserved here though, like the performance of a specific combination of variables, if I didn't just overlook this.In contrast,
svmRFE_NG$fit
seems to contain cross validation results for different hyperparameters of the "final model" (the best performing combination of features and hyperparameters). So those can be used for the more classic statistic about the final model you obtained from the whole process.