Solved – How to plot ROC for multi-class classifier? One-vs-All or One-vs-One

multi-classrroc

I am trying to evaluate a multi-class classification model, say Random Forest, I built in R. So far, my approach has been looking at the ConfusionMatrix from the caret package and other results that are calculated by that function, but it gives me only a class wise result. I am also calculating micro and macro averages for estimating the overall accuracy for the model.

Now, when I am trying to plot the ROC curve, I have two options:

One-vs-One approach: gives me ⁿC₂ combinations of ROC curves, which I am not sure how to interpret. The pROC package implements this method, suggested by Hand and Till because it supposedly gives a more accurate AUC.
One-vs-All approach: gives me n ROC curves and their corresponding AUC. A reference for this I found in another similar question

Is there any preferred method for plotting the ROC in this scenario? If so, why?

And can this choice of it being either One-vs-One or One-vs-All be generalized to any multi-class model?

Best Answer

When you use roc curves, you are saying that

misclassification costs are not the same for different types of mistakes. If they were, you would just optimize classification accuracy which would be the most adequate objective function and also more intuitive than all of the alternatives.
you don't know the actual misclassification costs and therefore want to compare classification based on multiple cutoffs. If you did, you would just put the actual costs in the objective function to optimize.

If both these circumstances are given, you may look into roc curves. Though F-measure and kappa are worth considering as well.

The one vs. all approach reflects that some classes like $A$ may or may not be more costly to miss than other classes $B$ or $C$. Each roc curve sweeps over possible cutoffs for $A$ vs. ($B$ or $C$), $B$ vs. ($A$ or $C$) and the last one for $C$ vs. ($A$ or $B$)

The one vs. one approach is only necessary if for a given class $A$, it is also different to mistake an $A$ for a $B$ than to mistake an $A$ for a $C$. This is the most general case, but also the least intuitive.

Related Solutions

Solved – Decision threshold for a 3-class Naive Bayes ROC curve

As I see it, the possibility to refuse classification as "too uncertain" is the whole point of choosing a threshold (as opposed to assigning the class with highest predicted probability).

Of course, you should have some justification for putting the threshold to 0.5: you may also put it up to 0.9 or any other value that is reasonable.

You describe a setup with mutually exclusive classes (closed-world problem). "No class reaches the threshold" can always happen as soon as that threshold is higher than 1/$n_{classes}$, i.e. the same problem occurs in a 2-class problem with threshold, say, 0.9. For threshold = 1/$n_{classes}$ it could happen in theory, but in practice it is highly unlikely.

So your problem is not related (just more pronounced) to the 3-class set-up.

To your second question: you can compute ROC curves for any kind of continuous output scores, they don't even need to claim that they are probabilities. Personally, I don't calibrate, because I don't want to waste another test set on that (I work with very restricted sample sizes). The shape of the ROC anyways won't change.

Answer to your comment: The ROC conceptually belongs to a set-up that in my field is called single-class classification: does a patient have a particular disease or not. From that point of view, you can assign a 10% probability that the patient does have the disease. But this does not imply that with 90% probability he has something defined - the complementary 90% actually belong to a "dummy" class: not having that disease. For some diseases & tests, finding everyone may be so important that you set your working point at a threshold of 0.1. Textbook example where you choose an extreme working point is HIV test in blood donations.

So for constucting the ROC for class A (you'd say: the patient is A positive), you look at class A posterior probabilities only. For binary classification with probability (not A) = 1 - probability (A), you don't need to plot the second ROC as it does not contain any information that is not readily accessible from the first one.

In your 3 class set up you can plot a ROC for each class. Depending on how you choose your threshold, no classification, exactly one class, or more than one class assigned can result. What is sensible depends on your problem. E.g. if the classes are "Hepatitis", "HIV", and "broken arm", then this policy is appropriate as a patient may have none or all of these.

Solved – R – Plotting a ROC curve for a Naive Bayes classifier using ROCR. Not sure if I’m plotting it correctly

The problem may lies here

pr <- prediction(pred, realResults)

You may transform "pred" and "realResults" to 0-1 vector by :

predvec <- ifelse(pred=="Lost", 1, 0)
realvec <- ifelse(realResults=="Lost", 1, 0)

and using:

pr <- prediction(predvec, realvec)

the problem may be solved.

Bonus part:

you can plot roc curve with more information by:

plot(ROCRperf, colorize=TRUE, print.cutoffs.at=seq(0,1,by=0.1), text.adj=c(-0.2,1.7))

And a simple way to get auc:

as.numeric(performance(ROCRpred, "auc")@y.values)

Best Answer

Related Solutions

Solved – Decision threshold for a 3-class Naive Bayes ROC curve

Solved – R – Plotting a ROC curve for a Naive Bayes classifier using ROCR. Not sure if I’m plotting it correctly

Related Question