Solved – Intuition of the Bhattacharya Coefficient and the Bhattacharya distance

bhattacharyyadistancedistance-functionsintuitionmathematical-statistics

The Bhattacharyya distance is defined as $D_B(p,q) = -\ln \left( BC(p,q) \right)$, where $BC(p,q) = \sum_{x\in X} \sqrt{p(x) q(x)}$ for discrete variables and similarly for continuous random variables. I'm trying to gain some intuition as to what this metric tells you about the 2 probability distributions and when it might be a better choice than KL-divergence, or Wasserstein distance. (Note: I am aware that KL-divergence is not a distance).

Best Answer

The Bhattacharyya coefficient is $$ BC(h,g)= \int \sqrt{h(x) g(x)}\; dx $$ in the continuous case. There is a good wikipedia article https://en.wikipedia.org/wiki/Bhattacharyya_distance. How to understand this (and the related distance)? Let us start with the multivariate normal case, which is instructive and can be found at the link above. When the two multivariate normal distributions have the same covariance matrix, the Bhattacharyya distance coincides with the Mahalanobis distance, while in the case of two different covariance matrices it does have a second term, and so generalizes the Mahalanobis distance. This maybe underlies claims that in some cases the Bhattacharyya distance works better than the Mahalanobis. The Bhattacharyya distance is also closely related to the Hellinger distance https://en.wikipedia.org/wiki/Hellinger_distance.

Working with the formula above, we can find some stochastic interpretation. Write $$ \DeclareMathOperator{\E}{\mathbb{E}} BC(h,g) = \int \sqrt{h(x) g(x)}\; dx = \\ \int h(x) \cdot \sqrt{\frac{g(x)}{h(x)}}\; dx = \E_h \sqrt{\frac{g(X)}{h(X)}} $$ so it is the expected value of the square root of the likelihood ratio statistic, calculated under the distribution $h$ (the null distribution of $X$). That makes for comparisons with Intuition on the Kullback-Leibler (KL) Divergence, which interprets Kullback-Leibler divergence as expectation of the loglikelihood ratio statistic (but calculated under the alternative $g$). Such a viewpoint might be interesting in some applications.

Still another viewpoint, compare with the general family of f-divergencies, defined as, see Rényi entropy $$ D_f(h,g) = \int h(x) f\left( \frac{g(x)}{h(x)}\right)\; dx $$ If we choose $f(t)= 4( \frac{1+t}{2}-\sqrt{t} )$ the resulting f-divergence is the Hellinger divergence, from which we can calculate the Bhattacharyya coefficient. This can also be seen as an example of a Renyi divergence, obtained from a Renyi entropy, see link above.

Related Question