Solved – Area Under the Precision Recall curve -similar interpretation to AUROC

classificationmachine learningprecision-recall

I am trying to interpret the AUCPR. Say I have the following Precision-Recall curve.

Firstly:
It ends at 0.38 on the y-axis because this particular plot has slightly imbalanced data, so am I correct in thinking its the sum of the positive class / the sum of the negative class.

If I had a negative class of 26300 and a positive class of 10000 then the PR curve would stop at 10000 / 26300 = 0.38.

Secondly:
In regards to the AUCPR. I understand that perfect accuracy would be a straight line going along the top of the plot but if this curve had a AUCPR of 0.90 then it would be considered good, however how can I interpret it when the value is perhaps 0.50, this couldn´t be a random guess like in AUROC since the curve ends at 0.38. Is there a way to know when the model becomes worse than flipping a coin?

Best Answer

The PR curve has a simple geometric interpretation of the expected precision when uniformly varying the recall. It does have a 1-1 correspondence to the ROC curve as shown by Davis & Goadrich (2006) "The relation between precision-recall and ROC curves".

A more informative interpretation of AUCPR can be found in Flach & Kill (2015) "Precision-Recall-Gain Curves: PR Analysis Done Right". There, the authors present the concept of Precision-Recall-Gain curves, a reformulation of the usual PR curve where the "always positive" classifier serves as the baseline. Following that the AUPRG (Area-Under-the-Precision-Recall-Gain-curve) can be interpreted as the expected $F_1$ score.

Regarding the specific side-questions raised:

$0.38$ reflects the proportion of positive examples within the training sample. So in the particular case mentioned having $26300$ negative and $10000$ positive examples leads to a base-line of $~0.275$. A more extensive discussion on the baseline of a PR curve can be found in the following CV.SE tread: What is "baseline" in precision recall curve
The AUCPR in itself is not very informative. As mentioned it relates to the expected precision when uniformly varying the recall. In that sense, our model quickly becomes "worse than flipping a coin" as we move to more imbalanced datasets; if we know for example that we have a 90-10 imbalance a "fair coin" is bad and it is to our advantage to use a "loaded coin". This is where the work of F&K cited above comes into play; it directly models the gain in terms of P-R assuming a "recall-aware" baseline. To that extent, one might want to look at Cohen's $\kappa$ as a quick measurement as it directly accounts for the expected accuracy of a classifier; personally I use almost always when first looking a binary classifier's results. CV.SE has an excellent thread explaining it, in more detail: Cohen's kappa in plain English.

Related Solutions

Solved – Optimising for Precision-Recall curves under class imbalance

The ROC curve is insensitive to changes in class imbalance; see Fawcett (2004) "ROC Graphs: Notes and Practical Considerations for Researchers".
Up-sampling the low-frequency class is a reasonable approach.
There are many other ways of dealing with class imbalance. Boosting and bagging are two techniques that come to mind. This seems like a relevant recent study: Comparing Boosting and Bagging Techniques With Noisy and Imbalanced Data

P.S. Neat problem; I'd love to know how it turns out.

Solved – outlier detection: area under precision recall curve

The problem is with your example that it is possible to have zero $tp$ and zero $fp$, therefore the precision $prec = tp/(tp+fp)$ becomes undefined because we divide by zero. Because of this the PR curve only contains points for one $x$-value, and therefore the area under the PR curve becomes zero for your example.

You can see this by plotting the PR curve:

[X,Y,T,PR] = perfcurve(label,score,1, 'xCrit', 'reca', 'yCrit', 'prec') % PR = 0
figure
scatter(X,Y)
xlabel('recall')
ylabel('precision')

So plotting a PR curve doesn't really work well when all your scores are the same.

To gain more insights between the difference of the PR curve and the ROC curve, compare these two prediction lists. We consider the case where we predict all zeros, and predict one 1, but it should be zero (score1). This one doesnt work very well, it predicts 0 everywhere, except for one object where it predicts 1 where it should be zero. We consider another case, where we predict one 1 correctly, and the rest we classify as 0. Here we thus predict 1 one correctly, and the rest we classify as 0. We compare the area under the PR curve and the area under the ROC.

outlier = 1;
normal  = 0;
% 99% normal data 1% outlier
label = normal*ones(1000,1); 
label(1:10) = outlier;  

%label = real( rand(1000,1) > 0.99 );     % 99% normal data 1% outlier
score1 = [zeros(999,1);1]; % predict everything as zero, and one mistake 
score2 = [1;zeros(999,1)]; % predict everything as zero, and one 1 correct 

[X,Y,T,AUC1] = perfcurve(label,score1,1)
% AUC1 = 0.5
[X,Y,T,AUC2] = perfcurve(label,score2,1)
% AUC2 = 0.55

[X,Y,T,PR1] = perfcurve(label,score1,1, 'xCrit', 'reca', 'yCrit', 'prec') 
% PR1 = 0.005 
[X,Y,T,PR2] = perfcurve(label,score2,1, 'xCrit', 'reca', 'yCrit', 'prec') 
% PR2 = 0.4545

Observe that the AUC varies little between score1 and score2. However, the area under the PR curve is significantly different. It rewards score2 much more than score1. This indicates it is better suited to outlier detection: it rewards detecting the outlier much more than the AUC. In case of outlier detection you would prefer score2 much more, since it predicts the 1 that you want to detect correctly, while score1 predicts a 1 for a zero and never catches any outliers.

In general, the AUC is more informative to give an idea how well your predictions work for varying priors. Thus the AUC characterizes how well the classifier works for varying number of ones and zeros.

The PR curves indicates more well how it performs for the current class imbalance considered. Therefore the PR curve is more interesting for you: it takes into account there are little 1's in your dataset than 0's. Because you are only interested in this case when you are interested in outlier detection, the PR curve is more informative.

While the AUC characterizes how your predictions would do if there are much more 1's as well.

For more information see also:

https://www.quora.com/What-is-Precision-Recall-PR-curve

ROC vs precision-and-recall curves

Finally, you might be interested in how to compute an ROC / PR curve, a detailed explanation is given here for ROC curves:

http://blogs.sas.com/content/iml/2011/07/29/computing-an-roc-curve-from-basic-principles.html

Best Answer

Related Solutions

Solved – Optimising for Precision-Recall curves under class imbalance

Solved – outlier detection: area under precision recall curve

Related Question