Solved – Precision vs Recall acceptable Limits

classificationmachine learningprecision-recall

I have few queries on Precision and Recall in Classification of machine learning.

While I was reading, I found that high Precision will result low recall and vice versa.

  1. However if someone ask how much % of Precision is acceptable, answer could be?

  2. Is recall should/Can be greater than Precision? Or Precision should/can be greater than recall?

  3. To be consider Precision , should it be? 95% or if we consider recall should it be > 95%.

  4. In my results of test, I got 100% in recall, can recall or Precision be greater than 100?

Best Answer

To explain this, I would use an example. I trained a model that classifies bananas and not bananas. When I evaluate the model, I use 20 pieces of fruit, 10 bananas and 10 other fruits. 8 bananas were classified as bananas, and the other 2 as "other fruits". 7 pieces of fruit that aren't bananas were classified as "other fruits", and the other 3 as bananas.

If "banana" is our positive class and "other fruits" our negative class, we will have 8 True Positives, 2 False Negatives, 7 True Negatives and 3 False Positives.

Given this, we can calculate the precision as True Positives / (True Positives + False Positives), or in my own words, the proportion of the Predicted as Positive that are really Positive. The recall can be calculated as True Positives / (True Positives + False Negatives) or in my own words, the proportion of Positive samples that have been identified as Positive.

So we can now answer the 4th question, precision and recall can never be greater than 1 because the denominator of the equation will be equal or greater than the numerator always.

Question 1 is difficult to answer because it depends on the problem. In many cases, you will need a precision or recall greater and 0.9 to make sure that you are prediction correctly, but in other cases, we can accept lower values. This question doesn't have a unique answer.

Question 2, again, it depends. In some cases, you will want to have a lot of precision even if this means that the recall will be lower because you want that all your positive predictions must be positive (think in a system that predicts if a person has cancer, you don't want to give chemo to a healthy person). On the other hand, in some cases, you will want a high recall because it means that no positive sample will be classified as negative (think in a computer virus detector, it's better to classify a file as dangerous to be sure that no virus will infect our computer).

Question 3, same as the 2 previous questions.