Machine Learning – How ROC of Binary Classifier is Affected by Inverted Class Labels

aucbinary dataclassificationmachine learningroc

In some domains we have negative/positive class well defined (assigned).

In general binary classification problem we can assign class labels in two ways.

When we have model trained for the original label assignment (its ROC), what can we infer about ROC (AUC especially) for model trained for inverted class assignment (using the same algorithm)?

Best Answer

If you invert the labels, you have $$\mathrm{TPR}_1 = \mathrm{TNR}_{\text{inv}} = 1-\mathrm{FPR}_{\text{inv}}$$ and $$\mathrm{FPR}_1= \mathrm{FNR}_{\text{inv}} = 1-\mathrm{TPR}_{\text{inv}},$$ where by $X_1$ I mean a property before inverting, and by $X_{\text{inv}}$ a property after inverting.

AUC plots TPR vs FPR, so inverting the labels will just flip your diagram: your new diagram will plot $\mathrm{TPR}_{\text{inv}} = 1-\mathrm{FPR}_1$ vs. $\mathrm{FPR}_{\text{inv}}=1-\mathrm{TPR}_1$. AUC will be unchanged. (AUC measures discriminative performance, so it would be worrying if it would change when inverting the labels. It is also known to be relatively unaffected by label imbalance, i.e., $P(Y=1)$, which you will also invert when inverting the labels)

Notice that in practice, results may still differ slightly due to randomness and / or model training implementation details.

Related Solutions

Solved – How to calculate ROC AUC for classification algorithm such as random forest

Although the randomForest package does not have a built-in function to generate a ROC curve and an AUC measure, it is very easy to generate in a case of 2 classes by using it in combination with the package pROC.

The randomForest function will create an object that contains votes. These votes are based on the Out of Bag (OOB) sample tree classification votes for each data point. These votes roughly represent a probability, and therefore can be used to create a ROC and AUC measure.

Here is an example, assuming the packages randomForest and pROC are installed:

require(randomForest)
data(iris)

# This will make drop a class to make it a 2 class problem
iris<-iris[-which(iris$Species=="setosa"),]
iris$Species<-as.factor(as.character(iris$Species))

set.seed(71)
iris.rf<-randomForest(Species ~ ., data=iris,ntree=10)
require(pROC)
rf.roc<-roc(iris$Species,iris.rf$votes[,2])
plot(rf.roc)
auc(rf.roc)

AUC – Impact on AUC if Positive and Negative Classes Are Swapped During Model Training

I believe one can show that the algorithms are not convergent. They get to close to a solution and then do a random walk.

Best Answer

Related Solutions

Solved – How to calculate ROC AUC for classification algorithm such as random forest

AUC – Impact on AUC if Positive and Negative Classes Are Swapped During Model Training

Related Question