Solved – Calculating precision and recall in R

precision-recallr

Suppose I'm building a logistic regression classifier that predicts whether someone is married or single. (1 = married, 0 = single) I want to choose a point on the precision-recall curve that gives me at least 75% precision, so I want to choose thresholds $t_1$ and $t_2$, so that:

  • If the output of my classifier is greater than $t_1$, I output "married".
  • If the output is below $t_2$, I output "single".
  • If the output is in between, I output "I don't know".

A couple questions:

  1. I think under the standard definition of precision, precision will be measuring the precision of the married class alone (i.e., precision = # times I correctly predict married / total # times I predict married). However, what I really want to do is measure the overall precision (i.e., the total # times I correctly predict married or single / total # times I predict married or single). Is this an okay thing to do? If not, what should I be doing?
  2. Is there a way to calculate this "overall" precision/recall curve in R (e.g., using the ROCR package or some other library)? I'm currently using the ROCR package, but it seems to only give me the single-class-at-a-time precision/recall.

Best Answer

I wrote a function for this purpose, based on the exercise in the book Data Mining with R:

# Function: evaluation metrics
    ## True positives (TP) - Correctly idd as success
    ## True negatives (TN) - Correctly idd as failure
    ## False positives (FP) - success incorrectly idd as failure
    ## False negatives (FN) - failure incorrectly idd as success
    ## Precision - P = TP/(TP+FP) how many idd actually success/failure
    ## Recall - R = TP/(TP+FN) how many of the successes correctly idd
    ## F-score - F = (2 * P * R)/(P + R) harm mean of precision and recall
prf <- function(predAct){
    ## predAct is two col dataframe of pred,act
    preds = predAct[,1]
    trues = predAct[,2]
    xTab <- table(preds, trues)
    clss <- as.character(sort(unique(preds)))
    r <- matrix(NA, ncol = 7, nrow = 1, 
        dimnames = list(c(),c('Acc',
        paste("P",clss[1],sep='_'), 
        paste("R",clss[1],sep='_'), 
        paste("F",clss[1],sep='_'), 
        paste("P",clss[2],sep='_'), 
        paste("R",clss[2],sep='_'), 
        paste("F",clss[2],sep='_'))))
    r[1,1] <- sum(xTab[1,1],xTab[2,2])/sum(xTab) # Accuracy
    r[1,2] <- xTab[1,1]/sum(xTab[,1]) # Miss Precision
    r[1,3] <- xTab[1,1]/sum(xTab[1,]) # Miss Recall
    r[1,4] <- (2*r[1,2]*r[1,3])/sum(r[1,2],r[1,3]) # Miss F
    r[1,5] <- xTab[2,2]/sum(xTab[,2]) # Hit Precision
    r[1,6] <- xTab[2,2]/sum(xTab[2,]) # Hit Recall
    r[1,7] <- (2*r[1,5]*r[1,6])/sum(r[1,5],r[1,6]) # Hit F
    r}

Where for any binary classification task, this returns the precision, recall, and F-stat for each classification and the overall accuracy like so:

> pred <- rbinom(100,1,.7)
> act <- rbinom(100,1,.7)
> predAct <- data.frame(pred,act)
> prf(predAct)
      Acc     P_0       R_0       F_0       P_1       R_1       F_1
[1,] 0.63 0.34375 0.4074074 0.3728814 0.7647059 0.7123288 0.7375887

Calculating the P, R, and F for each class like this lets you see whether one or the other is giving you more difficulty, and it's easy to then calculate the overall P, R, F stats. I haven't used the ROCR package, but you could easily derive the same ROC curves by training the classifier over the range of some parameter and calling the function for classifiers at points along the range.