Solved – ROC Area Under Curve (AUC) in SVM – different results between R functions

caretclassificationrsvm

I have two questions relating to ROC AUC values in SVM training and testing.

  1. After training and testing an SVM in caret I've found differences between the AUC values calculated by caret, pROC and the ggplot2 extension plotROC. The max AUC from training in caret is less than either AUC from testing. Is this normal? Intuitively I would have thought that testing AUC would be lower than in training because of some level of poor fitting to unseen data.

  2. Does anyone have an explanation for the differences between AUC from pROC and the ggplot2 extension plotROC that are both calculated on the testing prediction? I've had a look at the documentation for both pROC and plotROC (and the code for plotROC's calculate_roc function) but haven't been able to determine a reason. Or have I made a coding error in calculating the AUCs?

Reproducible example:

Load the GermanCredit dataset that has 2 classes and various feature variables.

data("GermanCredit") 
# Remove zero variance variables (prior knowledge)
gc <- GermanCredit %>% select(-Purpose.Vacation, -Personal.Female.Single)

Training/testing partition.

set.seed(71)
gc_i <- createDataPartition(gc$Class, p = 0.8, list = FALSE)

OPTIONAL: For parallel processing, set the number of cores (workers) and set seeds within resampling as running parallel processing.

registerDoMC(cores = 2)
# Set seeds for reproducibility 
## In this case B = (5 repeats of 5-Fold CV) +1 = 51; M = 1 (only one parameter combination being used)
set.seed(456)
seeds1 <- vector(mode = "list", length = 26)
for(i in 1:25) seeds1[[i]] <- sample.int(1000, 25) ## 5*5 parameters from tuneLength=5
## For the last model:
seeds1[[26]] <- sample.int(1000, 1)

Training

gc_ctrl1 <- trainControl(method = "repeatedcv",
                    number = 5,
                    repeats = 5,
                    classProbs = TRUE,
                    summaryFunction = twoClassSummary,
                    savePredictions = TRUE,
                    seeds = seeds1) # optional

gc_train1 <- train(Class~., gc[gc_i, ],
              method = "svmRadial",
              # train() use its default method of calculating an analytically derived estimate for sigma
              tuneLength = 5,# 5 arbitrary values for C and sigma = 25 models
              trControl = gc_ctrl1,
              preProc = c("center", "scale"),
              metric = "ROC",
              verbose = FALSE)

max(gc_train1$results[,"ROC"])
# 0.7800372

Using train1 best model to test.

# Confusion matrix using `caret::confusionMatrix`
gc_pred <- predict(gc_train1, newdata = gc[-gc_i,] %>% select(-Class), type = "raw")
gc_CM <- confusionMatrix(gc_pred, gc[-gc_i,]$Class)

ROC and AUC using pROC

# ROC using pROC
gc_prob <- predict(gc_train1, newdata = gc[-gc_i,] %>% select(-Class), type = "prob")
gc_pROC <- roc(response = gc[-gc_i,]$Class, predictor = gc_prob[, "Good"])
plot(gc_pROC)
gc_pROC$auc
# Area under the curve: 0.8376

ROC and AUC using plotROC (ggplot2 extension)

# ROC using plotROC (ggplot2 extension)
gc_prob_ex <- extractProb(list(gc_train1), gc[-gc_i,] %>% select(-Class))
gc_ggROC <- ggplot(gc_prob_ex, aes(d=obs, m=Good)) + geom_roc() 
gc_ggROC_styled <- gc_ggROC +  annotate("text", x = .75, y = .25, 
       label = paste("AUC =", round(calc_auc(gc_ggROC)$AUC, 2)))
gc_ggROC_styled
# Area under the curve: 0.96

GermanCredit ROC from plotROC

Best Answer

Q1.

There are many possible causes for this. There's noise associated with point estimates of evaluation metrics, and these might make your performance on test data seem better than on train data. Now, on average, it's not reasonable to expect a model behaves better on data it has never seen than on data it was trained on. The maximum AUC you see on test data might be an artifact, created by a specific distribution of the samples in a fold.

Q2.

While I never used ggplot like that, it seems gc_prob_ex receives the predictions from the train set and not from the test set.