Why is the area under the ROC curve the probability that a classifier will rank a randomly chosen "positive" instance (from the retrieved predictions) higher than a randomly chosen "positive" one (from the original positive class)?
How does one prove this statement mathematically using integral, giving the CDFs and PDFs of the true positive and negative class distributions?
Solved – How to derive the probabilistic interpretation of the AUC
aucprobabilityroc
Best Answer
First thing, let's try to define the area under the ROC curve formally. Some assumptions and definitions:
We have a probabilistic classifier that outputs a "score" s(x), where x are the features, and s is a generic increasing monotonic function of the estimated probability p(class = 1|x).
$f_{k}(s)$, with $k = \{0, 1\}$ := pdf of the scores for class k, with CDF $F_{k}(s)$
The classification of a new observation is obtained compraing the score s to a threshold t
Furthermore, for mathematical convenience, let's consider the positive class (event detected) k = 0, and negative k = 1. In this setting we can define:
The ROC curve is then a plot of $F_{0}(t)$ against $F_{1}(t)$. Setting $v = F_1(s)$, we can formally define the area under the ROC curve as: $$AUC =\int_{0}^{1} F_{0}(F_{1}^{-1}(v)) dv$$ Changing variable ($dv = f_{1}(s)ds$): $$AUC =\int_{ - \infty}^{\infty} F_{0}(s) f_{1}(s)ds$$
This formula can easiliy be seen to be the probability that a randomly drawn member of class 0 will produce a score lower than the score of a randomly drawn member of class 1.
This proof is taken from: https://pdfs.semanticscholar.org/1fcb/f15898db36990f651c1e5cdc0b405855de2c.pdf