In a typical ROC curve all I do is a line from (0, 0)
to (tpr, fpr)
and a line from (tpr, fpr)
to (1, 1)
. Now I see ROC curves with more than one points. Can someone explain what these extra points represent?
Solved – ROC curve with multiple points
roc
Related Solutions
As I see it, the possibility to refuse classification as "too uncertain" is the whole point of choosing a threshold (as opposed to assigning the class with highest predicted probability).
Of course, you should have some justification for putting the threshold to 0.5: you may also put it up to 0.9 or any other value that is reasonable.
You describe a setup with mutually exclusive classes (closed-world problem). "No class reaches the threshold" can always happen as soon as that threshold is higher than 1/$n_{classes}$, i.e. the same problem occurs in a 2-class problem with threshold, say, 0.9. For threshold = 1/$n_{classes}$ it could happen in theory, but in practice it is highly unlikely.
So your problem is not related (just more pronounced) to the 3-class set-up.
To your second question: you can compute ROC curves for any kind of continuous output scores, they don't even need to claim that they are probabilities. Personally, I don't calibrate, because I don't want to waste another test set on that (I work with very restricted sample sizes). The shape of the ROC anyways won't change.
Answer to your comment: The ROC conceptually belongs to a set-up that in my field is called single-class classification: does a patient have a particular disease or not. From that point of view, you can assign a 10% probability that the patient does have the disease. But this does not imply that with 90% probability he has something defined - the complementary 90% actually belong to a "dummy" class: not having that disease. For some diseases & tests, finding everyone may be so important that you set your working point at a threshold of 0.1. Textbook example where you choose an extreme working point is HIV test in blood donations.
So for constucting the ROC for class A (you'd say: the patient is A positive), you look at class A posterior probabilities only. For binary classification with probability (not A) = 1 - probability (A), you don't need to plot the second ROC as it does not contain any information that is not readily accessible from the first one.
In your 3 class set up you can plot a ROC for each class. Depending on how you choose your threshold, no classification, exactly one class, or more than one class assigned can result. What is sensible depends on your problem. E.g. if the classes are "Hepatitis", "HIV", and "broken arm", then this policy is appropriate as a patient may have none or all of these.
I agree with your concerns.
given that people in reality will seldom choose a FPR cut-off of 0.5 or higher, why people would prefer a ROC curve with FPR ranging from 0 to 1 and use the full AUC value (i.e. calculate the entire area under the ROC curve) instead of just reporting the area made from, say, 0 to 0.25 or to 0.5? Is that called "partial AUC"?
- I'm a big fan of having the complete ROC, as it gives much more information that just the sensitivity/specificity pair of one working point of a classifier.
- For the same reason, I'm not a big fan of summarizing all that information even further into one single number. But if you have to do so, I agree that it is better to restrict the calculations to parts of the ROC that are relevant for the application.
in the figure below, what can we say about the performances of the three models? The AUC values are: green (0.805), red (0.815), blue (0.768). The red curve turns out to be superior, but as you see, the superiority is only reflected after FPR > 0.2. Thanks :)
- That depends entirely on your application. In your example, if high specificity is needed, then the green classifier would be best. If high sensitivity is needed, go for the red one.
As to the comparison of classifiers: there are lots of questions and answers here discussing this. Summary:
- classifier comparison is far more difficult than one would expect at first
- not all classifier performance measures are good for this task. Read @FrankHarrells answers, and go for so-called proper scoring rules (e.g. Brier's score/mean squared error).
Best Answer
That pair TPR, FPR you put in your curve was obtained using a threshold in the continuous output (most likely probabilities).
Other points are obtained by simply changing said threshold.
A clarification: Sampling many points in the ROC curve is the norm, I don't remember ever seeing a ROC with such few samples. You can select different strategies to sample your TPR and FPR, but sampling enough points is essential, otherwise you wouldn't need the ROC at all.