The correct expression of the Hellinger Distance equation

distancedistance-functionshellinger

I am aware there are various ways to calculate the Hellinger Distance (H) depending on the context and data. One of these ways, as I understand, is via the Bhattacharyya coefficient (BC). For discrete distributions, $H=\sqrt{1-BC} $ where $BC=\sum_{i=1}^n \sqrt{p_i q_i} $. Hence we have:

$$H=\sqrt{1-\sum_{i=1}^n \sqrt{p_i q_i}} $$

However, I have found some expressions of the Hellinger Distance equation that includes a factor of 2 (see page 302, here) in the form of:

$$H=2\sqrt{1-\sum_{i=1}^n \sqrt{p_i q_i}} $$
This is equivalent to $H=2\sqrt{1-BC} $ found in a Cross Validated question here.

So which version of the Hellinger Equation is correct? Or am I missing something? A factor of two in a distance measure is hardly a trivial difference.

Best Answer

This doesn't really answer the question but maybe helpful anyway.

All applications of the Hellinger distance I can think of are invariant to whether there's a factor 2 in the definition or not, potentially adjusting, e.g., threshold values by the same factor. Obviously whatever version is used needs to be used consistently, so it is advisable that the used formula is always explicitly given when using the Hellinger distance.

For this reason, most mathematicians would consider the two versions equivalent. There is no consensus about which one is right, and there is no authority that would enforce such a consensus. Most mathematicians would think that no such consensus is needed, as the two are "the same" in all relevant aspects anyway.

Historically, one possibility for such a situation to emerge (I don't know about the Hellinger distance in particular) is that somebody defines a concept originally, and somebody else discovers that the same concept (but multiplied with a constant factor not present in the original definition) emerges nicely out of some theoretical considerations that help a lot motivating the concept; after which for both versions there is a reason to be seen as legitimate.

Generally, a mathematical way of looking at such things is that names and notation should not be taken as having a generally agreed meaning but rather they should be explicitly defined when used and then they are what they are defined to be, in the specific place.

It has to be admitted though that there are limitations to this attitude. Work of a certain complexity cannot define everything from scratch for pragmatic reasons, and non-mathematicians are understandably often baffled by the same name apparently not referring to the (exactly) same thing. So certain conventions are required and some exist (too many from the point of view of some pure mathematicians; not enough from the point of view of many other people).

As another example, personally I am annoyed to see that the BIC and AIC as used for model selection are used in some literature in the positive and in other literature in the negative form, so in one case "larger is better", in another one "smaller is better" - for sure the authors need to tell the readers explicitly which version is used, but in many places this is not done, and the reader has to guess from looking at reported results which one it is.