ROC Curve – Why ROC Curve and Thresholds Never Have the Ideal Point at the Top Left for Observations Close to Certainty

rocthresholdtrue positive rate

I am using ROC curves for multi-label classification. I have a classifier that produces a score for each label, say a Logistic Regression that produces a probability. I understand that an ROC curve is parameterized by a discrimination threshold and assigns to a class the observations where a class with the highest probability is above the threshold.

If so, imagine these predictions for 5 observations with labels A or B:

Observation #  Label  Prob(A)  Prob(B)
            1    A     0.9        0.1
            2    A    0.51       0.49
            3    B    0.51       0.49
            4    A    0.49       0.51
            5    B    0.49       0.51

The first observation is a freebie. With a discrimination threshold of 0.9, we assign that observation correctly and no observation incorrectly. So True Positives are 1 and all others are zero (True Negatives, False Positives, False Negatives). The True Positive Rate is 1 and the False Positive Rate is 0, which is the ideal point at the top left in an ROC curve. We never see that point in an ROC curve, so I suspect my reasoning is wrong, or my concept of True/False Positive Rates is wrong.

Another possibility is to assign only observations with a probability above a threshold to the most likely class, and all others to the negative class. But that approach lumps together an observation that we are sure is in the negative class and one that we're not sure is in the positive class. A consequence is that it is not invariant under re-labeling (positive to negative and vice-versa).

How exactly does an ROC curve use the discrimination threshold?

Best Answer

You seem to have a few misunderstandings about ROC curves.

I am using ROC curves for multi-label classification.

ROC curve are tools to assess the discrimination ability of binary classifiers. Some extensions exist for different types of problems such as multi class or multi label classification, but they are not ROC curves strictly speaking.

an ROC curve is parameterized by a discrimination threshold

A ROC curve is parameterized over all possible discrimination thresholds between $-\infty$ and $+\infty$.

With a discrimination threshold of 0.9, we assign that observation correctly and no observation incorrectly.

With a threshold of 0.9, we indeed (correctly) assign observation 1 to the positive predicted class.

All other observations are assigned to the negative predicted class. Because observations 2-5 are < 0.9, we assign them to the negative predicted class. As a result, observations 2 and 4, which should be positive, are misclassified as negatives, and decrease the True Positive Rate (sensitivity) and the AUC.

Because ROC curve is designed for binary classification problems, there is no such thing as "unassigned". If things are not positive, they are negative. If this assumption is not appropriate for your problem, then you're not having a binary classification problem, and ROC curves may be the wrong tool for you.

The True Positive Rate is 1 and the False Positive Rate is 0, which is the ideal point at the top left in an ROC curve. We never see that point in an ROC curve

This is wrong, this point is seen as soon as you have a perfect classifier. This might be hard to achieve in your field or for your problem, but it definitely exists.

How exactly does an ROC curve use the discrimination threshold?

I refer you to this CV question: What does AUC stand for and what is it?, which should answer this part of your question.

Related Question