Solved – Differences between Bhattacharyya distance and KL divergence

bhattacharyyainformation theorykullback-leiblermathematical-statistics

I'm looking for an intuitive explanation for the following questions:

In statistics and information theory, what's the difference between Bhattacharyya distance and KL divergence, as measures of the difference between two discrete probability distributions?

Do they have absolutely no relationships and measure the distance between two probability distribution in totally different way?

Best Answer

The Bhattacharyya coefficient is defined as $$D_B(p,q) = \int \sqrt{p(x)q(x)}\,\text{d}x$$ and can be turned into a distance $d_H(p,q)$ as $$d_H(p,q)=\{1-D_B(p,q)\}^{1/2}$$ which is called the Hellinger distance. A connection between this Hellinger distance and the Kullback-Leibler divergence is $$d_{KL}(p\|q) \geq 2 d_H^2(p,q) = 2 \{1-D_B(p,q)\}\,,$$ since \begin{align*} d_{KL}(p\|q) &= \int \log \frac{p(x)}{q(x)}\,p(x)\text{d}x\\ &= 2\int \log \frac{\sqrt{p(x)}}{\sqrt{q(x)}}\,p(x)\text{d}x\\ &= 2\int -\log \frac{\sqrt{q(x)}}{\sqrt{p(x)}}\,p(x)\text{d}x\\ &\ge 2\int \left\{1-\frac{\sqrt{q(x)}}{\sqrt{p(x)}}\right\}\,p(x)\text{d}x\\ &= \int \left\{1+1-2\sqrt{p(x)}\sqrt{q(x)}\right\}\,\text{d}x\\ &= \int \left\{\sqrt{p(x)}-\sqrt{q(x)}\right\}^2\,\text{d}x\\ &= 2d_H(p,q)^2 \end{align*}

However, this is not the question: if the Bhattacharyya distance is defined as$$d_B(p,q)\stackrel{\text{def}}{=}-\log D_B(p,q)\,,$$then \begin{align*}d_B(p,q)=-\log D_B(p,q)&=-\log \int \sqrt{p(x)q(x)}\,\text{d}x\\ &\stackrel{\text{def}}{=}-\log \int h(x)\,\text{d}x\\ &= -\log \int \frac{h(x)}{p(x)}\,p(x)\,\text{d}x\\ &\le \int -\log \left\{\frac{h(x)}{p(x)}\right\}\,p(x)\,\text{d}x\\ &= \int \frac{-1}{2}\log \left\{\frac{h^2(x)}{p^2(x)}\right\}\,p(x)\,\text{d}x\\ \end{align*} Hence, the inequality between the two distances is $${d_{KL}(p\|q)\ge 2d_B(p,q)\,.}$$ One could then wonder whether this inequality follows from the first one. It happens to be the opposite: since $$-\log(x)\ge 1-x\qquad\qquad 0\le x\le 1\,,$$ enter image description here

we have the complete ordering$${d_{KL}(p\|q)\ge 2d_B(p,q)\ge 2d_H(p,q)^2\,.}$$

Best Answer

Related Solutions

Solved – Hypothesis testing and total variation distance vs. Kullback-Leibler divergence

Solved – Bhattacharyya distance for histograms

Related Question