Solved – Area Under the Precision Recall curve -similar interpretation to AUROC

classificationmachine learningprecision-recall

I am trying to interpret the AUCPR. Say I have the following Precision-Recall curve.

Firstly:
It ends at 0.38 on the y-axis because this particular plot has slightly imbalanced data, so am I correct in thinking its the sum of the positive class / the sum of the negative class.

If I had a negative class of 26300 and a positive class of 10000 then the PR curve would stop at 10000 / 26300 = 0.38.

Secondly:
In regards to the AUCPR. I understand that perfect accuracy would be a straight line going along the top of the plot but if this curve had a AUCPR of 0.90 then it would be considered good, however how can I interpret it when the value is perhaps 0.50, this couldnĀ“t be a random guess like in AUROC since the curve ends at 0.38. Is there a way to know when the model becomes worse than flipping a coin?

enter image description here

Best Answer

The PR curve has a simple geometric interpretation of the expected precision when uniformly varying the recall. It does have a 1-1 correspondence to the ROC curve as shown by Davis & Goadrich (2006) "The relation between precision-recall and ROC curves".

A more informative interpretation of AUCPR can be found in Flach & Kill (2015) "Precision-Recall-Gain Curves: PR Analysis Done Right". There, the authors present the concept of Precision-Recall-Gain curves, a reformulation of the usual PR curve where the "always positive" classifier serves as the baseline. Following that the AUPRG (Area-Under-the-Precision-Recall-Gain-curve) can be interpreted as the expected $F_1$ score.

Regarding the specific side-questions raised:

  1. $0.38$ reflects the proportion of positive examples within the training sample. So in the particular case mentioned having $26300$ negative and $10000$ positive examples leads to a base-line of $~0.275$. A more extensive discussion on the baseline of a PR curve can be found in the following CV.SE tread: What is "baseline" in precision recall curve
  2. The AUCPR in itself is not very informative. As mentioned it relates to the expected precision when uniformly varying the recall. In that sense, our model quickly becomes "worse than flipping a coin" as we move to more imbalanced datasets; if we know for example that we have a 90-10 imbalance a "fair coin" is bad and it is to our advantage to use a "loaded coin". This is where the work of F&K cited above comes into play; it directly models the gain in terms of P-R assuming a "recall-aware" baseline. To that extent, one might want to look at Cohen's $\kappa$ as a quick measurement as it directly accounts for the expected accuracy of a classifier; personally I use almost always when first looking a binary classifier's results. CV.SE has an excellent thread explaining it, in more detail: Cohen's kappa in plain English.
Related Question