Solved – AUC and class imbalance in training/test dataset

aucmodel-evaluationroc

I just start to learn the Area under the ROC curve (AUC). I am told that AUC is not reflected by data imbalance. I think it means that AUC is insensitive to imbalance in test data, rather than imbalance in training data.

In other words, only changing the distribution of positive and negative classes in the test data, the AUC value may not change much. But if we change the distribution in the training data, the AUC value may largely change. The reason is that the classifier cannot be learned well. In this case, we have to use undersampling and oversampling. Am I right? I just want to make sure my understanding on AUC is correct.

Best Answer

It depends how you mean the word sensitive. The ROC AUC is sensitive to class imbalance in the sense that when there is a minority class, you typically define this as the positive class and it will have a strong impact on the AUC value. This is very much desirable behaviour. Accuracy is for example not sensitive in that way. It can be very high even if the minority class is not well predicted at all.

In most experimental setups (bootstrap or cross validation for example) the class distribution of training and test sets should be similar. But this is a result of how you sample those sets, not of using or not using ROC. Basically you are right to say that the ROC makes abstraction of class imbalance in the test set by giving equal importance to sensitivity and specificity. When the training set doesn't contain enough examples to learn the class, this will still affect ROC though, as it should.

What you do in terms of oversampling and parameter tuning is a separate issue. The ROC can only ever tell you how well a specific configuration works. You can then try multiple config's and select the best.

Related Solutions

Solved – Recall and AUC of a binary classifier

ROC curves are false negative rate vs true positive rate graph. If you have AUC = 1, by definition you have perfect classifier.

From Information retrieval viewpoint ; if you have AUC = 1 then you have perfect recall and perfect precision. You recall all documents which exists about this topic, also all the documents you recall are relevant to your topic.

I would like to add more information for response to commenter.

Following is a graph from "ROC Graphs: Notes and Practical Considerations for Data Mining Researchers, Tom Fawcett"

Figure 2 of ROC Graphs

A discrete classifier is one that outputs only a class label. 
Each discrete classifier produces an (FP rate,TP rate) pair, 
which corresponds to a single point in ROC space. 
...
The point (0;1) represents perfect classification. 
D's performance is perfect as shown.

Solved – pattern of ROC curve and choice of AUC

I agree with your concerns.

given that people in reality will seldom choose a FPR cut-off of 0.5 or higher, why people would prefer a ROC curve with FPR ranging from 0 to 1 and use the full AUC value (i.e. calculate the entire area under the ROC curve) instead of just reporting the area made from, say, 0 to 0.25 or to 0.5? Is that called "partial AUC"?

I'm a big fan of having the complete ROC, as it gives much more information that just the sensitivity/specificity pair of one working point of a classifier.
For the same reason, I'm not a big fan of summarizing all that information even further into one single number. But if you have to do so, I agree that it is better to restrict the calculations to parts of the ROC that are relevant for the application.

in the figure below, what can we say about the performances of the three models? The AUC values are: green (0.805), red (0.815), blue (0.768). The red curve turns out to be superior, but as you see, the superiority is only reflected after FPR > 0.2. Thanks :)

That depends entirely on your application. In your example, if high specificity is needed, then the green classifier would be best. If high sensitivity is needed, go for the red one.

As to the comparison of classifiers: there are lots of questions and answers here discussing this. Summary:

classifier comparison is far more difficult than one would expect at first
not all classifier performance measures are good for this task. Read @FrankHarrells answers, and go for so-called proper scoring rules (e.g. Brier's score/mean squared error).

Best Answer

Related Solutions

Solved – Recall and AUC of a binary classifier

Solved – pattern of ROC curve and choice of AUC

Related Question