Solved – Using the caret package is it possible to obtain confusion matrices for specific threshold values

caretclassificationconfusion matrixrroc

I've obtained a logistic regression model (via train) for a binary response, and I've obtained the logistic confusion matrix via confusionMatrix in caret. It gives me the logistic model confusion matrix, though I'm not sure what threshold is being used to obtain it. How do I obtain the confusion matrix for specific threshold values using confusionMatrix in caret?

Best Answer

Most classification models in R produce both a class prediction and the probabilities for each class. For binary data, in almost every case, the class prediction is based on a 50% probability cutoff.

glm is the same. With caret, using predict(object, newdata) gives you the predicted class and predict(object, new data, type = "prob") will give you class-specific probabilities (when object is generated by train).

You can do things differently by defining your own model and applying whatever cutoff that you want. The caret website also has an example that uses resampling to optimize the probability cutoff.

tl;dr

confusionMatrix uses the predicted classes and thus a 50% probability cutoff

Max