Machine Learning – Calculating Average of Precision and Recall

machine learningprecision-recall

A machine learning model is outputting precision and recall for a two-class classification problem (0 and 1) like this:

Confusion matrix:
[[136  21]
  41   6]]

Precision: [0.768 0.128]
Recall: [0.866 0.222]
Accuracy: 0.696

There are two measures for both precision and recall: The first measure for the 0 class and the second for 1 class. Is it okay to take the average of these? E.g. precision as a whole is 0.768+0.128 / 2 = 0.448? And similarly with recall?

Best Answer

WARNING: Average of precision/recall is totally different concept from Average Precision(AP) link.

Based on the question, we will talk about the Average of precision and recall.

you are partially correct; if

      Predicted 0 , Predicted 1 
True 0 [[136             21]     [[TP  FN]
True 1   41              6]]       FP  TN]]

Then precision for each class(row) is ( Mi,i / sigma(j) Mji), So for:

class 0: 136/ 136+41 (0.76)

class 1: 6 / 6+21 ( 0.22)

for recall, the same happens, but the denominator will be on rows, i.e. ( Mi,i / sigma(j) Mij)

class 0: 136/ 136+21 (0.86)

class 1: 6 / 6+41 ( 0.12)

Then you can average on each group to have overall precision/recall.

Check Table III of this paper (referred to as Precision_M and recall_M):

More precisely, you are doing macro-averaging.

in code, you can have :

cm = confusion_matrix(labels, predictions)
recall = np.diag(cm) / np.sum(cm, axis = 1)
precision = np.diag(cm) / np.sum(cm, axis = 0)

#overall precision/recall
np.mean(precision)
np.mean(recall)

How to Calculate Precision and Recall for Multiclass Classification Using Confusion Matrix

In a 2-hypothesis case, the confusion matrix is usually:

	Declare H1	Declare H0
Is H1	TP	FN
Is H0	FP	TN

where I've used something similar to your notation:

TP = true positive (declare H1 when, in truth, H1),
FN = false negative (declare H0 when, in truth, H1),
FP = false positive
TN = true negative

From the raw data, the values in the table would typically be the counts for each occurrence over the test data. From this, you should be able to compute the quantities you need.

Edit

The generalization to multi-class problems is to sum over rows / columns of the confusion matrix. Given that the matrix is oriented as above, i.e., that a given row of the matrix corresponds to specific value for the "truth", we have:

$\text{Precision}_{~i} = \cfrac{M_{ii}}{\sum_j M_{ji}}$

$\text{Recall}_{~i} = \cfrac{M_{ii}}{\sum_j M_{ij}}$

That is, precision is the fraction of events where we correctly declared $i$ out of all instances where the algorithm declared $i$. Conversely, recall is the fraction of events where we correctly declared $i$ out of all of the cases where the true of state of the world is $i$.

Machine Learning – How to Calculate Precision and Recall in Machine Learning

The logic remains the same for several classes, to wit

If a document belonging to A…
- is classified as A, it's a true positive/true A
- is classified as B, it's a false positive for B/false B and a false negative for A
- is classified as C, it's a false positive for C/false C and a false negative for A
If a document belonging to B…
- is classified as A, it's a false positive for A/false A and a false negative for B
- is classified as B, it's a true positive for B
- is classified as C, it's a false positive for C/false C and a false negative for B
etc.

Precision for A is true positives/(true positives + false positives) where “false positives” are the false positives from all other classes (i.e. the B documents classified as A + the C documents classified as A, etc.).

Recall for A is true positives/(true positives + false negatives) where “false negatives” are all the A documents not classified as A (i.e. the A documents classified as B + the A documents classified as C, etc.) or, equivalently, the total number of A documents minus the number of true positives.

You can also look at all this as a series of confusion matrices with two categories: One with A and non-A (so B and C together), one with B and non-B and finally one with C and non-C.

Most informative is to report precision and recall for each category (especially if you have just a few) but I have seen people combine them in a F1 score and average across categories to obtain some sort of overall performance measure.

Best Answer

Related Solutions

How to Calculate Precision and Recall for Multiclass Classification Using Confusion Matrix

Machine Learning – How to Calculate Precision and Recall in Machine Learning

Related Question