Solved – ROC vs Precision-recall curves on imbalanced dataset

machine learningmodel selectionprecision-recallrocunbalanced-classes

I just finished reading this discussion. They argue that PR AUC is better than ROC AUC on imbalanced dataset.

For example, we have 10 samples in test dataset. 9 samples are positive and 1 is negative. We have a terrible model which predicts everything positive. Thus, we will have a metric that TP = 9, FP = 1, TN = 0, FN = 0.

Then, Precision = 0.9, Recall = 1.0. The precision and recall are both very high, but we have a poor classifier.

On the other hand, TPR = TP/(TP+FN) = 1.0, FPR = FP/(FP+TN) = 1.0. Because the FPR is very high, we can identify that this is not a good classifier.

Clearly, ROC is better than PR on imbalanced datasets. Can somebody explain why PR is better?

Best Answer

First, the claim on the Kaggle post is bogus. The paper they reference, "The Relationship Between Precision-Recall and ROC Curves", never claims that PR AUC is better than ROC AUC. They simply compare their properties, without judging their value.

ROC curves can sometimes be misleading in some very imbalanced applications. A ROC curve can still look pretty good (ie better than random) while misclassifying most or all of the minority class.

In contrast, PR curves are specifically tailored for the detection of rare events and are pretty useful in those scenarios. They will show that your classifier has a low performance if it is misclassifying most or all of the minority class. But they don't translate well to more balanced cases, or cases where negatives are rare.

In addition, because they are sensitive to the baseline probability of positive events, they don't generalize well and only apply to the specific dataset they were built on, or to datastets with the exact same balance. This means it is generally difficult to compare PR curves from different studies, limiting their usefulness.

As always, it is important to understand the tools that are available to you and select the right one for the right application. I suggest reading the question ROC vs precision-and-recall curves here on CV.