Solved – Threshold in precision/recall curve

precision-recallthreshold

While I was reading Torgo's Data Mining with R, I found that the description of precision/recall curve was different compared with other approaches. Usually, these curves are based on a threshold that determines which probability value is good enough to decide when an event has occurred, so we can classify future events depending on that value. However, Torgo's description is as follows:

Precision/recall (PR) curves are visual representations of the
performance of a model in terms of the precision and recall
statistics. The curves are ob- tained by proper interpolation of the
values of the statistics at different working points. These working
points can be given by different cut-off limits on a ranking of the
class of interest provided by the model. In our case this would
correspond to different effort limits applied to the outlier ranking
produced by the models. Iterating over different limits (i.e., inspect
less or more reports), we get different values of precision and
recall. PR curves allow this type of analysis.

The application the author has in mind is that of a fraud detection problem in which we have a classification task resulting in values fraud, unknown and ok. We would like to output probabilities, rank them, select the first $k$ reports and be able to inspect them.

Is this an alternative measure of threshold in precision/recall curves? I think it is assuming that probabilities below 0.5 are to be classified as ok, 0.5 is equivalent to unknown and above 0.5 means fraud. Is that a correct assumption to make?

Thanks a lot!

Best Answer

Short answer: Torgo describes the usual method of generating such curves.

You can choose your threshold (= cut-off limit in the cited text) at any value. The cited text refers to one such choice as a working point.
That is, for a given working point, you'll observe exactly one (precision; recall) pair, i.e. one point in your graph. The precision-recall-curve is obtained by varying the threshold over the whole range of the classifier's continuous output ("scores", posterior probabilities, "votes") thus generating a curve from many working points.


Edit with respect to the comment:

I think "varying the threshold" is the usual way to explain or define the curve.

For the calculation, it is more efficient to sort the scores, and then see how precision and recall change when adding the next case: precision and recall can only change when the change in the threshold is large enough to cover the next score.

Consider this example:

case   true class   predicted score (high => class B)
1      A            0.2
3      B            0.5
2      A            0.6
4      B            0.9

threshold      recall    precision
> 0.9          N/A       0.0
(0.6, 0.9]     0.5       1.0        
(0.5, 0.6]     0.5       0.5
(0.2, 0.5]     1.0       0.67
< 0.2          1.0       0.5

That is, the precision-recall-curve acutally consists of points. It jumps from one point to the next when the threshold "crosses" an acutally predicted score. A smooth curve will result only for large numbers of test cases.