Solved – precision recall breakeven point

measurementmodel-evaluationprecision-recall

There are two popular measures that aggregate $Precision$ and $Recall$, there are $F1$ and $\text{Precision Recall Breakeven point}$.

$F1$ can be calculated easily by formula, but how to calculate $\text{breakeven point}$?

I have some experiment and get all four values: true positive, false positive, true negative, false negative, so now I can calculate Precision and Recall but how to find breakeven point in this case.

Best Answer

There is an excellent post (Obtaining predicted values (Y=1 or 0) from a logistic regression model fit) about the break-even point of precision (or sensitivity) and specificity. The latter is not the same as recall, but it should be easy to generalize from there.

If you look at the plot you will see a point where the metrics cross, this is your optimal cutoff point.

EDIT I have updated the code to include precision, recall, and F1

perf = function(cut, mod, y)
{
     yhat = (mod$fit>cut)
     w = which(y==1)
     sensitivity = mean( yhat[w] == 1 ) 
     specificity = mean( yhat[-w] == 0 ) 
     c.rate = mean( y==yhat ) 
     d = cbind(sensitivity,specificity)-c(1,1)
     d = sqrt( d[1]^2 + d[2]^2 ) 

     # F-score
     retrieved <- sum(yhat)
     precision <- sum(yhat & y) / retrieved
     recall <- sum(yhat & y) / sum(y)
     Fmeasure <- 2 * precision * recall / (precision + recall)
     out = t(as.matrix(c(sensitivity, specificity, c.rate,d, Fmeasure)))
     colnames(out) = c("sensitivity", "specificity", "c.rate", "distance", "F-score")
     return(out)
} 

y3.mod <- glm(y3 ~ x1 + x2 + x3 + x4 + x5 + x6, family=binomial()) 

par(mfrow=c(1,1))
s = seq(.01,.99,length=100)
OUT = matrix(0,100,5)
for(i in 1:100) OUT[i,]=perf(s[i],y3.mod,y3)   
      plot(s,OUT[,1],xlab="Cutoff",ylab="Value",cex.lab=1.5,cex.axis=1.5,ylim=c(0,1),type="l",lwd=2,axes=FALSE,col=2)
axis(1,seq(0,1,length=5),seq(0,1,length=5),cex.lab=1.)
axis(2,seq(0,1,length=5),seq(0,1,length=5),cex.lab=1.)
lines(s,OUT[,2],col="darkgreen",lwd=2)
lines(s,OUT[,3],col=4,lwd=2)
lines(s,OUT[,4],col="darkred",lwd=2)
lines(s,OUT[,5],col="black",lwd=2)
grid()
box()
legend("topleft",col=c(2,"darkgreen",4,"darkred","black"),lwd=c(2,2,2,2,2),c("Sensitivity","Specificity","Classification Rate","Distance","F-score"))
Related Question