Solved – In calculating the F-measure with precision and recall, why is the harmonic mean used

harmonic meanmathematical-statistics

The article for F-measure in Wikipedia says:

The traditional F-measure or balanced F-score (F1 score) is the harmonic mean of precision and recall:
$F_1=2\times\frac{precision \times recall}{precision+recall}$

Why is the harmonic mean used in particular, and not the arithmetic mean or geometric mean or any other type of averages?

What exactly does it mean, to calculate an harmonic mean?

Best Answer

The F-measure is often used in the natural language recognition field for means of evaluation. In particular, the F-measure was employed by the Message Understanding Conference (MUC), in order to evaluate named entity recognition (NER) tasks. Directly quoted from A survey of named entity recognition and classification written by D. Nadeau:

The harmonic mean of two numbers is never higher than the geometrical mean. It also tends towards the least number, minimizing the impact of large outliers and maximizing the impact of small ones. The F-measure therefore tends to privilege balanced systems.