Solved – When is an AUC score misleadingly high

aucmetricroc

I have an algorithm which gives an AUC (area under the receiver operating curve) of 0.94.

I mean, this is amazing, but… probably too amazing, considering the difficulty of the task I am working on. So how can I tell if the AUC is valid, or misleadingly high?

(P.S. yes, I am training on the training set and testing on the completely separate testing set.)

Best Answer

One possible reason you can get high AUROC with what some might consider a mediocre prediction is if you have imbalanced data (in favor of the "zero" prediction), high recall, and low precision. That is, you're predicting most of the ones at the higher end of your prediction probabilities, but most of the outcomes at the higher end of your prediction probabilities are still zero. This is because the ROC score still gets most of its "lift" at the early part of the plot, i.e., for only a small fraction of the zero-predictions.

For example, if 5% of the test set are "ones" and all of the ones appear in the top 10% of your predictions, then your AUC will be at least 18/19 because, after 18/19 of the zeroes are predicted, already 100% of the ones were predicted. Even if the top 5% are all zeroes.

A simple python example:

import sklearn
import numpy as np

yTest = [0,0,1,1,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
yPredicted = np.linspace(0.9, 0.1, num=len(yTest))
sklearn.metrics.roc_auc_score(yTest, yPredicted) # ~0.89

import matplotlib.pyplot as plt
fpr, tpr, threshold = sklearn.metrics.roc_curve(yTest, yPredicted)
plt.plot(fpr, tpr)

AUC curve from the python code sample

Whether this is a "bad" prediction depends on your priorities. If you think that false negatives are terrible and false positives are tolerable, then this prediction is okay. But if it's the opposite, then this prediction is pretty bad.