I wrote a function for this purpose, based on the exercise in the book Data Mining with R:
# Function: evaluation metrics
## True positives (TP) - Correctly idd as success
## True negatives (TN) - Correctly idd as failure
## False positives (FP) - success incorrectly idd as failure
## False negatives (FN) - failure incorrectly idd as success
## Precision - P = TP/(TP+FP) how many idd actually success/failure
## Recall - R = TP/(TP+FN) how many of the successes correctly idd
## F-score - F = (2 * P * R)/(P + R) harm mean of precision and recall
prf <- function(predAct){
## predAct is two col dataframe of pred,act
preds = predAct[,1]
trues = predAct[,2]
xTab <- table(preds, trues)
clss <- as.character(sort(unique(preds)))
r <- matrix(NA, ncol = 7, nrow = 1,
dimnames = list(c(),c('Acc',
paste("P",clss[1],sep='_'),
paste("R",clss[1],sep='_'),
paste("F",clss[1],sep='_'),
paste("P",clss[2],sep='_'),
paste("R",clss[2],sep='_'),
paste("F",clss[2],sep='_'))))
r[1,1] <- sum(xTab[1,1],xTab[2,2])/sum(xTab) # Accuracy
r[1,2] <- xTab[1,1]/sum(xTab[,1]) # Miss Precision
r[1,3] <- xTab[1,1]/sum(xTab[1,]) # Miss Recall
r[1,4] <- (2*r[1,2]*r[1,3])/sum(r[1,2],r[1,3]) # Miss F
r[1,5] <- xTab[2,2]/sum(xTab[,2]) # Hit Precision
r[1,6] <- xTab[2,2]/sum(xTab[2,]) # Hit Recall
r[1,7] <- (2*r[1,5]*r[1,6])/sum(r[1,5],r[1,6]) # Hit F
r}
Where for any binary classification task, this returns the precision, recall, and F-stat for each classification and the overall accuracy like so:
> pred <- rbinom(100,1,.7)
> act <- rbinom(100,1,.7)
> predAct <- data.frame(pred,act)
> prf(predAct)
Acc P_0 R_0 F_0 P_1 R_1 F_1
[1,] 0.63 0.34375 0.4074074 0.3728814 0.7647059 0.7123288 0.7375887
Calculating the P, R, and F for each class like this lets you see whether one or the other is giving you more difficulty, and it's easy to then calculate
the overall P, R, F stats. I haven't used the ROCR package, but you could easily derive the same ROC curves by training the classifier over the range of some parameter and calling the function for classifiers at points along the range.
There is an excellent post (Obtaining predicted values (Y=1 or 0) from a logistic regression model fit) about the break-even point of precision (or sensitivity) and specificity. The latter is not the same as recall, but it should be easy to generalize from there.
If you look at the plot you will see a point where the metrics cross, this is your optimal cutoff point.
EDIT I have updated the code to include precision, recall, and F1
perf = function(cut, mod, y)
{
yhat = (mod$fit>cut)
w = which(y==1)
sensitivity = mean( yhat[w] == 1 )
specificity = mean( yhat[-w] == 0 )
c.rate = mean( y==yhat )
d = cbind(sensitivity,specificity)-c(1,1)
d = sqrt( d[1]^2 + d[2]^2 )
# F-score
retrieved <- sum(yhat)
precision <- sum(yhat & y) / retrieved
recall <- sum(yhat & y) / sum(y)
Fmeasure <- 2 * precision * recall / (precision + recall)
out = t(as.matrix(c(sensitivity, specificity, c.rate,d, Fmeasure)))
colnames(out) = c("sensitivity", "specificity", "c.rate", "distance", "F-score")
return(out)
}
y3.mod <- glm(y3 ~ x1 + x2 + x3 + x4 + x5 + x6, family=binomial())
par(mfrow=c(1,1))
s = seq(.01,.99,length=100)
OUT = matrix(0,100,5)
for(i in 1:100) OUT[i,]=perf(s[i],y3.mod,y3)
plot(s,OUT[,1],xlab="Cutoff",ylab="Value",cex.lab=1.5,cex.axis=1.5,ylim=c(0,1),type="l",lwd=2,axes=FALSE,col=2)
axis(1,seq(0,1,length=5),seq(0,1,length=5),cex.lab=1.)
axis(2,seq(0,1,length=5),seq(0,1,length=5),cex.lab=1.)
lines(s,OUT[,2],col="darkgreen",lwd=2)
lines(s,OUT[,3],col=4,lwd=2)
lines(s,OUT[,4],col="darkred",lwd=2)
lines(s,OUT[,5],col="black",lwd=2)
grid()
box()
legend("topleft",col=c(2,"darkgreen",4,"darkred","black"),lwd=c(2,2,2,2,2),c("Sensitivity","Specificity","Classification Rate","Distance","F-score"))
Best Answer
In order to even think of calculating the SD from a single observation you have to know the distribution of the measure, and it is not the case for the majority of predictive models.
Thus, you are left with non-parametric ways of estimating SD; for instance, you can cross-validate the model and then use the vector of precision/recall/F/acc values over folds.