Solved – Area between the ROC curve and the Random Guessing Line

aucclassificationmachine learningroc

How close is my classifier to random guessing?

I need to quantify the inability of a binary classifier to obtain better results than random guessing in a single number evaluation metric.

The random guessing line (RGL) from (0,0) to (1,1) has an AUC of 0.5. But so does the blue curve (grey area).

ROC RGL

Wouldn't it be more suitable to use the area between the RGL and the ROC-curve to estimate how "close" a classifier is to actual random guessing?

enter image description here

Best Answer

First, the 0.5 random guess line is just a visual reference, what we really want to know if how well the classifier performs overall. I'm also not sure how your proposed method would provide a different result, assuming that you subtract first section that is below the 0.5 line.

Second, the AUC also has a nice statistical property where it is equivalent to the Wilcoxon-Mann-Whitney U Test statistic. This is the probability that the classifier will rank a randomly chosen positive instance higher than a randomly chosen negative instance.

In this case, we don't need to see the AUC value to know this is a poor classifier (and one unlikely to be seen in real life, as it implies that true positive values are consistently rated under a certain probability threshold while false positive values are more evenly distributed).