I have two questions relating to ROC AUC values in SVM training and testing.
-
After training and testing an SVM in
caret
I've found differences between the AUC values calculated bycaret
,pROC
and theggplot2
extensionplotROC
. The max AUC from training incaret
is less than either AUC from testing. Is this normal? Intuitively I would have thought that testing AUC would be lower than in training because of some level of poor fitting to unseen data. -
Does anyone have an explanation for the differences between AUC from
pROC
and theggplot2
extensionplotROC
that are both calculated on the testing prediction? I've had a look at the documentation for bothpROC
andplotROC
(and the code forplotROC
'scalculate_roc
function) but haven't been able to determine a reason. Or have I made a coding error in calculating the AUCs?
Reproducible example:
Load the GermanCredit
dataset that has 2 classes and various feature variables.
data("GermanCredit")
# Remove zero variance variables (prior knowledge)
gc <- GermanCredit %>% select(-Purpose.Vacation, -Personal.Female.Single)
Training/testing partition.
set.seed(71)
gc_i <- createDataPartition(gc$Class, p = 0.8, list = FALSE)
OPTIONAL: For parallel processing, set the number of cores (workers) and set seeds within resampling as running parallel processing.
registerDoMC(cores = 2)
# Set seeds for reproducibility
## In this case B = (5 repeats of 5-Fold CV) +1 = 51; M = 1 (only one parameter combination being used)
set.seed(456)
seeds1 <- vector(mode = "list", length = 26)
for(i in 1:25) seeds1[[i]] <- sample.int(1000, 25) ## 5*5 parameters from tuneLength=5
## For the last model:
seeds1[[26]] <- sample.int(1000, 1)
Training
gc_ctrl1 <- trainControl(method = "repeatedcv",
number = 5,
repeats = 5,
classProbs = TRUE,
summaryFunction = twoClassSummary,
savePredictions = TRUE,
seeds = seeds1) # optional
gc_train1 <- train(Class~., gc[gc_i, ],
method = "svmRadial",
# train() use its default method of calculating an analytically derived estimate for sigma
tuneLength = 5,# 5 arbitrary values for C and sigma = 25 models
trControl = gc_ctrl1,
preProc = c("center", "scale"),
metric = "ROC",
verbose = FALSE)
max(gc_train1$results[,"ROC"])
# 0.7800372
Using train1 best model to test.
# Confusion matrix using `caret::confusionMatrix`
gc_pred <- predict(gc_train1, newdata = gc[-gc_i,] %>% select(-Class), type = "raw")
gc_CM <- confusionMatrix(gc_pred, gc[-gc_i,]$Class)
ROC and AUC using pROC
# ROC using pROC
gc_prob <- predict(gc_train1, newdata = gc[-gc_i,] %>% select(-Class), type = "prob")
gc_pROC <- roc(response = gc[-gc_i,]$Class, predictor = gc_prob[, "Good"])
plot(gc_pROC)
gc_pROC$auc
# Area under the curve: 0.8376
ROC and AUC using plotROC (ggplot2 extension)
# ROC using plotROC (ggplot2 extension)
gc_prob_ex <- extractProb(list(gc_train1), gc[-gc_i,] %>% select(-Class))
gc_ggROC <- ggplot(gc_prob_ex, aes(d=obs, m=Good)) + geom_roc()
gc_ggROC_styled <- gc_ggROC + annotate("text", x = .75, y = .25,
label = paste("AUC =", round(calc_auc(gc_ggROC)$AUC, 2)))
gc_ggROC_styled
# Area under the curve: 0.96
Best Answer
Q1.
There are many possible causes for this. There's noise associated with point estimates of evaluation metrics, and these might make your performance on test data seem better than on train data. Now, on average, it's not reasonable to expect a model behaves better on data it has never seen than on data it was trained on. The maximum AUC you see on test data might be an artifact, created by a specific distribution of the samples in a fold.
Q2.
While I never used
ggplot
like that, it seemsgc_prob_ex
receives the predictions from the train set and not from the test set.