[Math] Interpretation of Hellinger distance

probabilityreference-requeststatistics

Given to discrete probability distribution $\mathbf{p}:=(p_1,p_2,\dots,p_n)$ and $\mathbf{q}:=(q_1,q_2,\dots,q_n)$, the Hellinger distance between $\mathbf{p}$ and $\mathbf{q}$ is defined as:
$$
d_H(\mathbf{p},\mathbf{q}):=\frac{1}{\sqrt{2}}\left\|\mathbf{p}^{1/2}-\mathbf{q}^{1/2}\right\|_2=\frac{1}{\sqrt{2}}\left(\sum_{i=1}^n \left(\sqrt{p_i}-\sqrt{q_i}\right)^2\right)^{1/2},
$$

Why is this distance extensively exploited in statistics and probability? What is the geometrical/statistical interpretation of this distance? Assuming that $\mathbf{p},\mathbf{q}$ represent vectors and not probability distributions, has this distance been studied in other areas different from statistics?

My questions are not technical, but I was not able to find references which clearly address them.

Thank you for your help.

Best Answer

From my understanding, the equation can be rewritten as (Cha, 2007)

$$ 2\sqrt{1 - \sum_{i=1}^n\sqrt{p_iq_i}} $$

Here we can see that the part below is basically the geometric mean. This mean is useful in comparing values with different ranges. It denotes a central value for the product of two probabilities, rather than the middle value in an arithmetic way.

$$ \sum_{i=1}^n\sqrt{p_iq_i} $$

In comparing two probability distributions, the probability of an event or outcome for both distributions are plugged in the formula and compared. When there is no overlap (either or both values are 0) the maximum distance is assigned. When components are non-empty there is a certain overlap and a distance is calculated.

From this point I do not understand why the geometric mean is subtracted from 1 and a second square root is taken. I do know that the formula is very powerful in high-dimensional data or skewed distributions through class imbalances. The distance metric should be insensitive the skewed date. For example, within computer sciences one application of hellinger distance is anomaly detection.

Hopefully, someone else can contribute to answer these open questions.

Cha, Sung-Hyuk, Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions, 2007.

Best Answer

Related Solutions

[Math] Relative entropy is non-negative

The I-projection of a distribution to a family of distributions

Related Question