Solved – Standard deviation of recall, precision, accuracy and F-score

precision-recallstandard deviation

I'm not sure how to correctly calculate the standard deviation of recall, precision, accuracy and F-score.

For example:

TP  FP  FN  TN  total   recall   precision    F-Score    accuracy                           
40  10  15  50  115     0.727      0.800       0.762      0.783

TP = True Positive, FN = False Negative, etc.

(Do I use the formula for standard deviation of a binomial distribution:
$\sigma = \sqrt{np(1-p)}$? OR $\sigma = \sqrt{\frac{p(1-p)}{n}}$
with $n$ = total and $p$ = recall, precision, F-Score, or accuracy.)

How can i calculate the standard deviation?

I'm not even sure if it makes sense but i was asked to check this.

Best Answer

In order to even think of calculating the SD from a single observation you have to know the distribution of the measure, and it is not the case for the majority of predictive models.

Thus, you are left with non-parametric ways of estimating SD; for instance, you can cross-validate the model and then use the vector of precision/recall/F/acc values over folds.

Related Solutions

Solved – Calculating precision and recall in R

I wrote a function for this purpose, based on the exercise in the book Data Mining with R:

# Function: evaluation metrics
    ## True positives (TP) - Correctly idd as success
    ## True negatives (TN) - Correctly idd as failure
    ## False positives (FP) - success incorrectly idd as failure
    ## False negatives (FN) - failure incorrectly idd as success
    ## Precision - P = TP/(TP+FP) how many idd actually success/failure
    ## Recall - R = TP/(TP+FN) how many of the successes correctly idd
    ## F-score - F = (2 * P * R)/(P + R) harm mean of precision and recall
prf <- function(predAct){
    ## predAct is two col dataframe of pred,act
    preds = predAct[,1]
    trues = predAct[,2]
    xTab <- table(preds, trues)
    clss <- as.character(sort(unique(preds)))
    r <- matrix(NA, ncol = 7, nrow = 1, 
        dimnames = list(c(),c('Acc',
        paste("P",clss[1],sep='_'), 
        paste("R",clss[1],sep='_'), 
        paste("F",clss[1],sep='_'), 
        paste("P",clss[2],sep='_'), 
        paste("R",clss[2],sep='_'), 
        paste("F",clss[2],sep='_'))))
    r[1,1] <- sum(xTab[1,1],xTab[2,2])/sum(xTab) # Accuracy
    r[1,2] <- xTab[1,1]/sum(xTab[,1]) # Miss Precision
    r[1,3] <- xTab[1,1]/sum(xTab[1,]) # Miss Recall
    r[1,4] <- (2*r[1,2]*r[1,3])/sum(r[1,2],r[1,3]) # Miss F
    r[1,5] <- xTab[2,2]/sum(xTab[,2]) # Hit Precision
    r[1,6] <- xTab[2,2]/sum(xTab[2,]) # Hit Recall
    r[1,7] <- (2*r[1,5]*r[1,6])/sum(r[1,5],r[1,6]) # Hit F
    r}

Where for any binary classification task, this returns the precision, recall, and F-stat for each classification and the overall accuracy like so:

> pred <- rbinom(100,1,.7)
> act <- rbinom(100,1,.7)
> predAct <- data.frame(pred,act)
> prf(predAct)
      Acc     P_0       R_0       F_0       P_1       R_1       F_1
[1,] 0.63 0.34375 0.4074074 0.3728814 0.7647059 0.7123288 0.7375887

Calculating the P, R, and F for each class like this lets you see whether one or the other is giving you more difficulty, and it's easy to then calculate the overall P, R, F stats. I haven't used the ROCR package, but you could easily derive the same ROC curves by training the classifier over the range of some parameter and calling the function for classifiers at points along the range.

Solved – precision recall breakeven point

There is an excellent post (Obtaining predicted values (Y=1 or 0) from a logistic regression model fit) about the break-even point of precision (or sensitivity) and specificity. The latter is not the same as recall, but it should be easy to generalize from there.

If you look at the plot you will see a point where the metrics cross, this is your optimal cutoff point.

EDIT I have updated the code to include precision, recall, and F1

perf = function(cut, mod, y)
{
     yhat = (mod$fit>cut)
     w = which(y==1)
     sensitivity = mean( yhat[w] == 1 ) 
     specificity = mean( yhat[-w] == 0 ) 
     c.rate = mean( y==yhat ) 
     d = cbind(sensitivity,specificity)-c(1,1)
     d = sqrt( d[1]^2 + d[2]^2 ) 

     # F-score
     retrieved <- sum(yhat)
     precision <- sum(yhat & y) / retrieved
     recall <- sum(yhat & y) / sum(y)
     Fmeasure <- 2 * precision * recall / (precision + recall)
     out = t(as.matrix(c(sensitivity, specificity, c.rate,d, Fmeasure)))
     colnames(out) = c("sensitivity", "specificity", "c.rate", "distance", "F-score")
     return(out)
} 

y3.mod <- glm(y3 ~ x1 + x2 + x3 + x4 + x5 + x6, family=binomial()) 

par(mfrow=c(1,1))
s = seq(.01,.99,length=100)
OUT = matrix(0,100,5)
for(i in 1:100) OUT[i,]=perf(s[i],y3.mod,y3)   
      plot(s,OUT[,1],xlab="Cutoff",ylab="Value",cex.lab=1.5,cex.axis=1.5,ylim=c(0,1),type="l",lwd=2,axes=FALSE,col=2)
axis(1,seq(0,1,length=5),seq(0,1,length=5),cex.lab=1.)
axis(2,seq(0,1,length=5),seq(0,1,length=5),cex.lab=1.)
lines(s,OUT[,2],col="darkgreen",lwd=2)
lines(s,OUT[,3],col=4,lwd=2)
lines(s,OUT[,4],col="darkred",lwd=2)
lines(s,OUT[,5],col="black",lwd=2)
grid()
box()
legend("topleft",col=c(2,"darkgreen",4,"darkred","black"),lwd=c(2,2,2,2,2),c("Sensitivity","Specificity","Classification Rate","Distance","F-score"))

Best Answer

Related Solutions

Solved – Calculating precision and recall in R

Solved – precision recall breakeven point

Related Question