Machine Learning – Why is Balanced Accuracy an Arithmetic Mean Instead of Harmonic?

classificationconfusion matrixf1)machine learning

In the F1 score, the harmonic mean of precision (Positive Predictive Value) and sensitivity/recall (True Positive Rate), I understand that we use the harmonic mean in order to penalize extreme values of one or the other, and because the harmonic mean tends to be better than the arithmetic mean for averaging rates. See e.g. Why we don't use weighted arithmetic mean instead of harmonic mean?.

However, the Balanced Accuracy score is simply an arithmetic mean of sensitivity/recall (True Positive Rate) and specificity (True Negative Rate). Why do we conventionally use the arithmetic mean for this quantity and not the harmonic mean?

Best Answer

For typical accuracy, we collect all the correct results in the numerator and divide it by the total number of samples. This doesn't account for class imbalance. If we balance the classes by giving more importance to the minority class such that total weight of minority class is equal to total weight of majority class, the calculated accuracy is called balanced accuracy. We balance the classes and then calculate the usual accuracy formula. This is by nature equivalent to the arithmetic mean of specificity and sensitivity.

Example

Let's say we have TP=20, FP=30, TN=70, FN=30. We have 50 Positive samples, and 100 Negative samples in the dataset. If we balance the dataset and count each positive sample as two, we would have TP=40, FN=60. Then the usual accuracy would be (40+70)/(100+100) = 11/20.

If we had calculated this as arithmetic mean of sensitivity, 20/(20+30)=2/5 and specificity, 70/(70+30)=7/10 in the first place, we'd get (7/10 + 2/5) / 2 = 11/20, which is why arithmetic mean of specificity and sensitivity is called balanced accuracy, and their harmonic mean is something else.