Solved – R caret classification – why doesn’t model accuracy equal accuracy given by predict()

accuracycaretclassificationr

I have a dataset with 1000 samples, and each sample is 1 of 3 classes. I'm training classifiers on the dataset and predicting classes (5-fold cross-validated) and I'd like to know how well each classifier is doing. To do so, I train the classifiers with caret's train function which returns an accuracy (caret::train(...)$results$Accuracy). I also manually calculate the accuracy using each classifier's predicted classes (stats::predict()).

However, these two ways give different numbers. Why is there a difference? Which method should I use?

Code below reproduces the difference, although the magnitude is really small. In my real dataset the difference is much bigger, e.g. 80% vs 100%.

# make data
df = data.frame(x1 = runif(1000),
                x2 = runif(1000),
                label = character(1000), stringsAsFactors = F)
df$label[1:500] = "A"
df$label[501:900] = "B"
df$label[901:1000] = "C"
df$x1[df$label=="A"] = df$x1[df$label=="A"] - .25
df$x2[df$label=="B"] = df$x2[df$label=="B"] + .25
df$x1[df$label=="C"] = df$x1[df$label=="C"] + .125
df$x2[df$label=="C"] = df$x2[df$label=="C"] - .125

# classify
ctrl = trainControl(method = "cv", number=5, classProbs = F)
mod = caret::train(x=as.matrix(df[,1:2]), y=df$label,
                   method = "svmLinear",
                   trControl = ctrl)

# accuracy from mod$results$Accuracy
mod.accuracy = max(mod$results$Accuracy)

# accuracy from predict()
preds = stats::predict(mod, as.matrix(df[,1:2]), type = "raw")
predict.accuracy = sum(preds == df$label) / nrow(df)

print(paste("Accuracy from mod$results$Accuracy is", mod.accuracy))
print(paste("Accuracy from predict() is", predict.accuracy))

> [1] "Accuracy from mod$results$Accuracy is 0.655"
> [1] "Accuracy from predict() is 0.667"

Best Answer

The value in mod\$results\$Accuracy is the CV-score: for each of the 5 train/test splits, a model is fitted on the training fold and scored on the test fold, and these five accuracies are averaged. After that, a final model is trained on the entire dataset, and that is the model that is used for predictions. So, what you compute directly at the end is the training score of the final model; this is obviously very optimistically biased, so the first quantity is the one you should be using for an estimate of performance.