Machine-Learning – Can Accuracy be Validated Using Precision and Recall?

accuracymachine learningprecision-recall

My apologies in advance if I am skipping some basics. But I know the formula and understanding of how to calculate accuracy, precision and recall. My question is, given the accuracy, can we validate it using precision (P) and recall (R)? Is there some way that if I have the output of P and R, can I calculate/validate accuracy?

Kindly guide.

Best Answer

Assuming we know the sample size $N$ we can get the Accuracy from knowing Precision and Recall. Precision is defined as $\frac{TP}{TP+FP}$ and Recall is defined as $\frac{TP}{TP+FN}$, $TP$ is the number of True Positives, $FP$ is the number of False Positives and $FN$ is the number of True Negatives. Now given that $N = TP+TN+FP+FN$ the only thing we do not know if the number of $TN$. We can solve for $TN = N-(TP+FP+FN)$, given that we can calculate the Accuracy as $\frac{TP+TN}{N}$. If we do not know that total number of samples examined, $N$, then we are stuck.

In general, Precision and Recall (and their harmonic mean, the $F_1$ score) are intuitive measurements indeed but they do not account for the correct classification of negative examples (True Negatives) and that is on certain occasions inconvenient (or outright misleading).

Related Solutions

Solved – Calculating precision and recall in R

I wrote a function for this purpose, based on the exercise in the book Data Mining with R:

# Function: evaluation metrics
    ## True positives (TP) - Correctly idd as success
    ## True negatives (TN) - Correctly idd as failure
    ## False positives (FP) - success incorrectly idd as failure
    ## False negatives (FN) - failure incorrectly idd as success
    ## Precision - P = TP/(TP+FP) how many idd actually success/failure
    ## Recall - R = TP/(TP+FN) how many of the successes correctly idd
    ## F-score - F = (2 * P * R)/(P + R) harm mean of precision and recall
prf <- function(predAct){
    ## predAct is two col dataframe of pred,act
    preds = predAct[,1]
    trues = predAct[,2]
    xTab <- table(preds, trues)
    clss <- as.character(sort(unique(preds)))
    r <- matrix(NA, ncol = 7, nrow = 1, 
        dimnames = list(c(),c('Acc',
        paste("P",clss[1],sep='_'), 
        paste("R",clss[1],sep='_'), 
        paste("F",clss[1],sep='_'), 
        paste("P",clss[2],sep='_'), 
        paste("R",clss[2],sep='_'), 
        paste("F",clss[2],sep='_'))))
    r[1,1] <- sum(xTab[1,1],xTab[2,2])/sum(xTab) # Accuracy
    r[1,2] <- xTab[1,1]/sum(xTab[,1]) # Miss Precision
    r[1,3] <- xTab[1,1]/sum(xTab[1,]) # Miss Recall
    r[1,4] <- (2*r[1,2]*r[1,3])/sum(r[1,2],r[1,3]) # Miss F
    r[1,5] <- xTab[2,2]/sum(xTab[,2]) # Hit Precision
    r[1,6] <- xTab[2,2]/sum(xTab[2,]) # Hit Recall
    r[1,7] <- (2*r[1,5]*r[1,6])/sum(r[1,5],r[1,6]) # Hit F
    r}

Where for any binary classification task, this returns the precision, recall, and F-stat for each classification and the overall accuracy like so:

> pred <- rbinom(100,1,.7)
> act <- rbinom(100,1,.7)
> predAct <- data.frame(pred,act)
> prf(predAct)
      Acc     P_0       R_0       F_0       P_1       R_1       F_1
[1,] 0.63 0.34375 0.4074074 0.3728814 0.7647059 0.7123288 0.7375887

Calculating the P, R, and F for each class like this lets you see whether one or the other is giving you more difficulty, and it's easy to then calculate the overall P, R, F stats. I haven't used the ROCR package, but you could easily derive the same ROC curves by training the classifier over the range of some parameter and calling the function for classifiers at points along the range.

Solved – Standard deviation of recall, precision, accuracy and F-score

In order to even think of calculating the SD from a single observation you have to know the distribution of the measure, and it is not the case for the majority of predictive models.

Thus, you are left with non-parametric ways of estimating SD; for instance, you can cross-validate the model and then use the vector of precision/recall/F/acc values over folds.

Best Answer

Related Solutions

Solved – Calculating precision and recall in R

Solved – Standard deviation of recall, precision, accuracy and F-score

Related Question