Distributions – How Does Bhattacharyya Distance Not Satisfy Triangle Inequality

bhattacharyyadistancedistributionsmetric

Googling doesn't seem to show many informative results. I don't know if the concept is too trivial that I should know immediately or it's an old topic. It's either article / blogs repeating the wiki or just explaining the calculation.

In wiki, the Bhattacharyya distance:

$$D_B(p,q) = -\ln \left( BC(p,q) \right)$$

where the coefficient is given by

$$BC(p,q) = \sum_{x\in X} \sqrt{p(x) q(x)}$$

with these conditions

$$0 \leq BC ≤ 1 \space and \space 0 ≤ D_B ≤ ∞$$

From the wiki, it seems I should be able deduct that the distance doesn't satisfy the triangle inequality easily given the condition above. But I have no idea how.

Best Answer

The triangle inequality would be that $$D_B(p,q)\leq D_B(p,r)+D_B(r,q)$$ for all probability distributions $p,q,r$. So to show that the inequality does not hold, it is sufficient to find one counterexample.

One such counterexample is given by the following simple Bernoulli distributions: $$ p=(0.1,0.9), \quad q=(0.9,0.1), \quad r=(0.5,0.5). $$ Then $$D_B(p,q) = -\ln(2\sqrt{0.09}) \approx 0.51$$ but $$D_B(p,r)=D_B(r,q)=-\ln(\sqrt{0.05}+\sqrt{0.45})\approx 0.11. $$


In general, I hack together a simple R script when searching for such counterexamples. (Or when I have a hunch and want to test it before thinking deeply about it. "Computers are cheap, and thinking hurts.") In the present case, a script like the following quickly points us in the right direction:

nn <- 2
normalize <- function(xx) xx/sum(xx)
DB <- function(pp,qq) -log(sum(sqrt(pp*qq)))

while ( TRUE ) {
    pp <- normalize(runif(nn))
    qq <- normalize(runif(nn)) 
    rr <- normalize(runif(nn))
    if ( DB(pp,rr) > DB(pp,qq)+DB(qq,rr) ) {
        cat(pp,"\n",qq,"\n",rr,"\n")
        break
    }
}