Solved – How to derive the probabilistic interpretation of the AUC

aucprobabilityroc

Why is the area under the ROC curve the probability that a classifier will rank a randomly chosen "positive" instance (from the retrieved predictions) higher than a randomly chosen "positive" one (from the original positive class)?
How does one prove this statement mathematically using integral, giving the CDFs and PDFs of the true positive and negative class distributions?

Best Answer

First thing, let's try to define the area under the ROC curve formally. Some assumptions and definitions:

We have a probabilistic classifier that outputs a "score" s(x), where x are the features, and s is a generic increasing monotonic function of the estimated probability p(class = 1|x).
$f_{k}(s)$, with $k = \{0, 1\}$ := pdf of the scores for class k, with CDF $F_{k}(s)$
The classification of a new observation is obtained compraing the score s to a threshold t

Furthermore, for mathematical convenience, let's consider the positive class (event detected) k = 0, and negative k = 1. In this setting we can define:

Recall (aka Sensitivity, aka TPR): $F_{0}(t)$ (proportion of positive cases classified as positive)
Specificity (aka TNR): $1 - F_{1}(t)$ (proportion of negative cases classified as negative)
FPR (aka Fall-out): 1 - TNR = $F_{1}(t)$

The ROC curve is then a plot of $F_{0}(t)$ against $F_{1}(t)$. Setting $v = F_1(s)$, we can formally define the area under the ROC curve as: $$AUC =\int_{0}^{1} F_{0}(F_{1}^{-1}(v)) dv$$ Changing variable ($dv = f_{1}(s)ds$): $$AUC =\int_{ - \infty}^{\infty} F_{0}(s) f_{1}(s)ds$$

This formula can easiliy be seen to be the probability that a randomly drawn member of class 0 will produce a score lower than the score of a randomly drawn member of class 1.

This proof is taken from: https://pdfs.semanticscholar.org/1fcb/f15898db36990f651c1e5cdc0b405855de2c.pdf

Related Solutions

Solved – Recall and AUC of a binary classifier

ROC curves are false negative rate vs true positive rate graph. If you have AUC = 1, by definition you have perfect classifier.

From Information retrieval viewpoint ; if you have AUC = 1 then you have perfect recall and perfect precision. You recall all documents which exists about this topic, also all the documents you recall are relevant to your topic.

I would like to add more information for response to commenter.

Following is a graph from "ROC Graphs: Notes and Practical Considerations for Data Mining Researchers, Tom Fawcett"

Figure 2 of ROC Graphs

A discrete classifier is one that outputs only a class label. 
Each discrete classifier produces an (FP rate,TP rate) pair, 
which corresponds to a single point in ROC space. 
...
The point (0;1) represents perfect classification. 
D's performance is perfect as shown.

Solved – An intuitive meaning of the area under the PR curve

The area under the PR-Curve is ill-defined. Because there is no well-defined precision at recall 0: you get a division by zero there.

You also cannot close this gap easily - it may be anything from 0 to 1, depending on how well your retrieval works.

There is a common approximation to this - AveP, average precision.

Best Answer

Related Solutions

Solved – Recall and AUC of a binary classifier

Solved – An intuitive meaning of the area under the PR curve

Related Question