Machine Learning – Why ROC Curves Are Insensitive to Class Distributions

machine learningprecision-recallroc

I am confused over why ROC is invariance under class distribution described in the paper An Introduction to ROC analysis. I cannot understand the example on why the proportion of positive to negative classes in a test set does not affect the ROC curves.

Also to quote from this post, it says that:

To show this, first let's start with a very nice way to define precision, recall and specificity. Assume you have a "positive" class called 1 and a "negative" class called 0. $\hat{Y}$ is your estimate of the true class label $Y$. Then:
$$
\begin{aligned}
&\text{Precision} &= P(Y = 1 | \hat{Y} = 1) \\
&\text{Recall} = \text{Sensitivity} &= P(\hat{Y} = 1 | Y = 1) \\
&\text{Specificity} &= P(\hat{Y} = 0 | Y = 0)
\end{aligned}
$$
The key thing to note is that sensitivity/recall and specificity, which make up the ROC curve, are probabilities conditioned on the true class label. Therefore, they will be the same regardless of what $P(Y = 1)$ is.

I cannot reconcile these few concepts together, likely due to a gap in statistical rigour. I would highly appreciate someone to give me a more detailed example on why the above is true.

To be more specific, can someone explain the above quote? In particular what does it mean to be conditioned on P(Y=1)? What is this P referring to? And why does conditioning on this implies that ROC is insensitive to class distribution.
To add on, I did read through almost every post related to this question, but don't see consensus in whether ROC curve is sensitive or insensitive to class imbalance.

The posts I read: I know it's quite a lot, I even managed to implement ROC curve using pure python code without an issue. But it seems that even if I can implement it, I still do not fully understand it.

Interpretation of ROC

Pros and Cons of AUROC

The Relationship Between Precision-Recall and ROC Curves
Drawbacks of AUROC.
ROC vs precision-and-recall curves
ROC vs Precision-recall curves on imbalanced dataset
on why AUC can be misleading

Latest Understanding 21st September 2021:

As Professor Frank Harrell has mentioned in the post below, I reinforce my understanding by further saying:

Y takes on 0 and 1, and the area under the ROC graph (call this value $a$), in a simplified manner, signifies that if you take randomly a positive sample, and a negative sample, your probability of positive sample being ranked higher (read: higher probability) than the negative sample is $a$.

Now with his analogy, the teacher is negative sample, and soccer star is positive sample, so now you conditioned on Y = 0, and Y = 1. Once you conditioned on say, $Y=0$, (Specificity/TNR or 1-FPR) then your sample space effectively reduces from the whole population of the samples, to only $Y=0$, From this, I intuitively think that $Y=1$ does not play a part and hence does not influence the FPR in any way. Similar concept can be applied to TPR. As a result, neither TPR nor FPR depends on the whole sample space (the whole distribution of the test (?) set), and as a result will not be influenced under class distributional changes in the testset (?).

TODO: To reason why precision depends on class distribution.

Best Answer

Since all points on an ROC curve condition on Y, the distribution of Y is necessarily irrelevant for the points. This also points out why ROC curves should not be used except in a retrospective case-control study where samples are taken from Y=0 and Y=1 observations. For prospectively observed data where we sample based on X or take completely random samples, it is not logical to use a representation that disrespects how the samples arose. See https://www.fharrell.com/post/addvalue/

Interpretation of ROC

Pros and Cons of AUROC

Best Answer

Related Solutions

Imbalanced Data – Area Under the ROC Curve vs PR Curve for Imbalanced Data

Solved – ROC vs Precision-recall curves on imbalanced dataset

Related Question