Solved – How to calculate ROC AUC for classification algorithm such as random forest

aucclassificationmachine learningrandom forest

As at In classification with 2 – classes, can a higher accuracy leads to a lower ROC – AUC?, AdamO said that for random forest ROC AUC is not available, because there is no cut-off value for this algorithm, and ROC AUC is only calculable in the case if the algorithm returns a continuous probability value (and only 1 value) for an unseen element.

But in R and Python, it is very often, such as pROC::auc in R, or roc_auc_score in sklearn in python, we can calculate ROC AUC after we have predicted results, i.e. ROC AUC is available for all algorithms.

Summary: Why ROC AUC is not available for random forest (in case of classify True/False label)

the idea of AUC is: if you change the cut-off value, the assignment will change. For instance, in regression, a predict value for an object is 0.75 and the cut-off value is 0.8, it will be assign to False (0) and if the cut-off value is 0.6 it will be assigned to True (1).

But for randomForest, the assignment never change.

For instance with the function predict_proba, sklearn will return a list of probability for each class, not only 1 probability

So, let’s say sklearn returns for an unseen element

True        False
0.21        0.19

Whatever cut-off value changes, the assignment will always be True and never change.

Best Answer

Although the randomForest package does not have a built-in function to generate a ROC curve and an AUC measure, it is very easy to generate in a case of 2 classes by using it in combination with the package pROC.

The randomForest function will create an object that contains votes. These votes are based on the Out of Bag (OOB) sample tree classification votes for each data point. These votes roughly represent a probability, and therefore can be used to create a ROC and AUC measure.

Here is an example, assuming the packages randomForest and pROC are installed:

require(randomForest)
data(iris)

# This will make drop a class to make it a 2 class problem
iris<-iris[-which(iris$Species=="setosa"),]
iris$Species<-as.factor(as.character(iris$Species))

set.seed(71)
iris.rf<-randomForest(Species ~ ., data=iris,ntree=10)
require(pROC)
rf.roc<-roc(iris$Species,iris.rf$votes[,2])
plot(rf.roc)
auc(rf.roc)

enter image description here