Classification Metrics – Starting Point of PR-Curve and AUCPR Value for Ideal Classifier

classificationprecision-recall

I have two questions about the PR-curve:

What is the starting point of the PR-curve ?
I mean the point which corresponds to the highest possible threshold (i.e. when all scores are below this threshold). It is clear that all hard labels are equal to zero in this case. Hence, $\text{TP}=\text{FP}=0$ and $\text{Recall}=0$, but $\text{Precision}=\frac{0}{0}$. Sklearn uses $\text{Precision}=1$ for this point. Is it a general rule or other precision values might be used for this point in different libraries (for example, in R)?
What is the AUCPR value for an ideal classifier ?
I mean area under the PR-curve (AUCPR) for an ideal binary classifier (i.e. there is a threshold value such that all samples are classified correctly by the model). It is clear that the PR-curve of such classifier passes through the point $(1,1)$. Moreover, any PR-curve passes through the point which was described above in "1." and point $(1, \frac{n_+}{n})$ (this is the point of the lowest possible threshold when all scores are above this threshold), where $n_+$ is the total number of positive samples and $n$ is the total number of samples. Does that mean that AUCPR is equal to 1 in this case (like AUCROC of the ideal classifier) or it may be less than 1 ?

Best Answer

Working convention: Point $(0,1)$ is the upper left corner and corresponds to $0$ Recall (i.e. no Recall) and $1$ Precision (i.e. perfect Precision).

Regarding the first question: The start point can be at any point along $0$ or $\frac{1}{n_+}$ Recall, where the PR-curve start depends on the classifier performance. While we would hope that we will start at the point $(\frac{1}{n_+},1)$ and we will slow increase our Recall with little expense to Precision (i.e. we are very precise to begin with and slowly sacrifice Precision for Recall) that is not guaranteed at all. The obvious example is when we misclassify our "most probable" example of our test set. In that case we have both $0$-th Recall and $0$-th Precision, i.e. we start from point $(0,0)$. For example, in the left-most graph shown below (red line), we have an artificial example where we start at point $(0,0.5)$ because the first $\frac{N}{2}$ points are indistinguishable from each other. We "immediately" classify correctly some examples (i.e. we get TPs and thus non-zero Recall) but at the same time we get an equal number of FPs leading us at a $0.5$ Precision.

Please note, that in the case that no Positive examples are found (TPs or FPs), Precision is meaningless. There is not general rule on that what we do there. sklearn sets this to be $1$ that strictly for its convenience and explicitly says that these points "do not have a corresponding threshold". To that respect, in Davis & Goadrich (2006) the procedure of constructing a PR curve when presented with an algorithm returning probabilities is: "first find the probability that each test set example is positive, next sort this list and then traverse the sorted list in ascending order."; as such it is implied/suggested that for a probability that no example is positive, it makes no sense to construct a PR curve. In R PRROC::pr.curve does a similar thing where the origin is at $(0,0$) from the first positive example (example shown in pr3 below).

Side-note: in Python this leads in the slightly awkward situation of having Recall 0 with Precision 0 and 1 at the same time.

import numpy as np
from sklearn.metrics import precision_recall_curve 
print(__doc__)

my_ytest = np.concatenate([np.array(['1'] * 50), np.array(['2'] * 50)])
my_yscore = np.concatenate([ [0.95], np.random.uniform(0.0, 0.5, 49),
                            np.random.uniform(0.5, 0.9, 50) ])
prec, recall, _ = precision_recall_curve(my_ytest, my_yscore, pos_label="2")
prec[recall==0] 
# array([0., 1.])

Regarding the second question: Yes, the ideal classifier has AUCPR equal to 1. The only way to have an ideal classifier (i.e. performance that touches point $(1,1)$) but AUCPR less than $1$, is if we somehow moved towards $(1,1)$ while not already having perfect Precision (i.e. $y=1$). On occasion PR curves have the "sawtooth" shape (e.g. the middle graph shown below (dark green)), that suggests a significant jump in performance. That "tooth" though can never reach the point $(1,1)$ because by definition there are some misclassified points already. The "sawtooth effect" is due to us having a batch of correctly classified points, that helps us move both our Precision and Recall higher, followed by a batch of wrongly classified points that causes the sharp deep in Precision. To get the upward slope we increased our TP numbers while our FP & FN numbers remained the same but that does not mean though we removed our previously misclassified points; we can therefore never reach perfect Precision at $y=1$. For example in the right-most graph shown below (blue) a single point prohibits us from hitting $\text{AUCPR} = 1$; that misclassified FP point actually ranks higher than any other point in the positive class and thus forces our PR curve to start at $(0,0)$.

OK and some R code to see this first-handed:

library(PRROC)
N = 30000
set.seed(4321)

# The first N/2 points from each population are indistinguishable 
pr0 <- pr.curve(scores.class0=c(rep(0.5, N/2), runif(n = N/2, max=0.4)), 
                scores.class1=c(rep(0.5, N/2), runif(n = N/2, min=0.4, max = 0.49)), 
                curve = TRUE)

# The [0.5, 0.7] space allows us to have the performance increase
pr1 <- pr.curve(scores.class0=c(runif(N/3, min=0.9, max=1.0), 
                                runif(N/3, min=0.5, max=0.7), 
                                runif(N/3, max=0.25)),
                scores.class1=c(runif(N/2, min=0.7, max=0.9), 
                                runif(N/2, min=0.0, max=0.5)),
                curve=TRUE)

# The single point causes us to start from (0,0)
pr2 <- pr.curve(scores.class0=runif(n = N, min=0.999), 
                scores.class1=c(1, runif(N-1, max=0.999)), 
                curve = TRUE)


par(mfrow=c(1,3))
plot(pr0, legend=FALSE, col='red', panel.first= grid(), 
     cex.main = 1.5, main ="PR-curve starting at (0,0.5)")
plot(pr1, legend=FALSE, col='darkgreen', panel.first= grid(), 
     cex.main = 1.5, main ="PR-curve with a sawtooth!")
plot(pr2, legend=FALSE, col='blue', panel.first= grid(), 
     cex.main = 1.5, main ="PR-curve from a nearly ideal classifier")

Related Solutions

Solved – Calculating precision and recall in R

I wrote a function for this purpose, based on the exercise in the book Data Mining with R:

# Function: evaluation metrics
    ## True positives (TP) - Correctly idd as success
    ## True negatives (TN) - Correctly idd as failure
    ## False positives (FP) - success incorrectly idd as failure
    ## False negatives (FN) - failure incorrectly idd as success
    ## Precision - P = TP/(TP+FP) how many idd actually success/failure
    ## Recall - R = TP/(TP+FN) how many of the successes correctly idd
    ## F-score - F = (2 * P * R)/(P + R) harm mean of precision and recall
prf <- function(predAct){
    ## predAct is two col dataframe of pred,act
    preds = predAct[,1]
    trues = predAct[,2]
    xTab <- table(preds, trues)
    clss <- as.character(sort(unique(preds)))
    r <- matrix(NA, ncol = 7, nrow = 1, 
        dimnames = list(c(),c('Acc',
        paste("P",clss[1],sep='_'), 
        paste("R",clss[1],sep='_'), 
        paste("F",clss[1],sep='_'), 
        paste("P",clss[2],sep='_'), 
        paste("R",clss[2],sep='_'), 
        paste("F",clss[2],sep='_'))))
    r[1,1] <- sum(xTab[1,1],xTab[2,2])/sum(xTab) # Accuracy
    r[1,2] <- xTab[1,1]/sum(xTab[,1]) # Miss Precision
    r[1,3] <- xTab[1,1]/sum(xTab[1,]) # Miss Recall
    r[1,4] <- (2*r[1,2]*r[1,3])/sum(r[1,2],r[1,3]) # Miss F
    r[1,5] <- xTab[2,2]/sum(xTab[,2]) # Hit Precision
    r[1,6] <- xTab[2,2]/sum(xTab[2,]) # Hit Recall
    r[1,7] <- (2*r[1,5]*r[1,6])/sum(r[1,5],r[1,6]) # Hit F
    r}

Where for any binary classification task, this returns the precision, recall, and F-stat for each classification and the overall accuracy like so:

> pred <- rbinom(100,1,.7)
> act <- rbinom(100,1,.7)
> predAct <- data.frame(pred,act)
> prf(predAct)
      Acc     P_0       R_0       F_0       P_1       R_1       F_1
[1,] 0.63 0.34375 0.4074074 0.3728814 0.7647059 0.7123288 0.7375887

Calculating the P, R, and F for each class like this lets you see whether one or the other is giving you more difficulty, and it's easy to then calculate the overall P, R, F stats. I haven't used the ROCR package, but you could easily derive the same ROC curves by training the classifier over the range of some parameter and calling the function for classifiers at points along the range.

Precision-Recall Curve – How to Form a Precision-Recall Curve with Single P-R Value?

Generating a PR curve is similar to generating an ROC curve. To draw such plots you need a full ranking of the test set. To make this ranking, you need a classifier which outputs a decision value rather than a binary answer. The decision value is a measure of confidence in a prediction which we can use to rank all test instances. As an example, the decision values of logistic regression and SVM are a probability and a (signed) distance to the separating hyperplane, respectively.

If you dispose of decision values you define a set of thresholds on said decision values. These thresholds are different settings of a classifier: e.g. you can control the level of conservatism. For logistic regression, the default threshold would be $f(\mathbf{x}) = 0.5$ but you can go over the entire range of $(0, 1)$. Typically, the thresholds are chosen to be the unique decision values your model yielded for the test set.

At each choice of threshold, your model yields different predictions (e.g. different number of positive and negative predictions). As such, you get a set of tuples with different precision and recall at every threshold, e.g. a set of tuples $( T_i, P_i, R_i )$. The PR curve is drawn based on the $( P_i, R_i )$ pairs.

If I understood your comment correctly, the total similarity score you compute can be used as a decision value.

Best Answer

Related Solutions

Solved – Calculating precision and recall in R

Precision-Recall Curve – How to Form a Precision-Recall Curve with Single P-R Value?

Related Question