Solved – Interpretation of the area under the PR curve

machine learningprecision-recallroc

I'm currently comparing three methods and I have the Accuracy, auROC and auPR as metrics. And I have the following results :

Method A – acc: 0.75, auROC: 0.75, auPR: 0.45

Method B – acc: 0.65, auROC: 0.55, auPR: 0.40

Method C – acc: 0.55, auROC: 0.70, auPR: 0.65

I have a good understanding of accuracy and auROC (to remember well i often try to come up with a sentence like "auROC = characterize the ability to predict the positive class well", while not exactly correct it helps me remember). I have never had auPR data before and while I understand how it is built I can't get the "feeling" behind it.

In fact I fail to understand why the method C has an incredibly high score for auPR while being bad/average for the accuracy and auPR.

If someone could help me understand it a little better with a simple explanation that would be really great. Thank you.

Best Answer

One axis of ROC and PR curves is the same, that is TPR: how many positive cases have been classified correctly out of all positive cases in the data.

The other axis is different. ROC uses FPR, which is how many mistakenly declared positives out of all negatives in the data. PR curve uses precision: how many true positives out of all that have been predicted as positives. So the base of the second axis is different. ROC uses what's in the data, PR uses what's in the prediction as a basis.

PR curve is thought to be more informative when there is a high class imbalance in the data, see this paper http://pages.cs.wisc.edu/~jdavis/davisgoadrichcamera2.pdf .