Distributions – How Does Bhattacharyya Distance Not Satisfy Triangle Inequality

bhattacharyyadistancedistributionsmetric

Googling doesn't seem to show many informative results. I don't know if the concept is too trivial that I should know immediately or it's an old topic. It's either article / blogs repeating the wiki or just explaining the calculation.

In wiki, the Bhattacharyya distance:

$$D_B(p,q) = -\ln \left( BC(p,q) \right)$$

where the coefficient is given by

$$BC(p,q) = \sum_{x\in X} \sqrt{p(x) q(x)}$$

with these conditions

$$0 \leq BC ≤ 1 \space and \space 0 ≤ D_B ≤ ∞$$

From the wiki, it seems I should be able deduct that the distance doesn't satisfy the triangle inequality easily given the condition above. But I have no idea how.

Best Answer

The triangle inequality would be that $$D_B(p,q)\leq D_B(p,r)+D_B(r,q)$$ for all probability distributions $p,q,r$. So to show that the inequality does not hold, it is sufficient to find one counterexample.

One such counterexample is given by the following simple Bernoulli distributions: $$ p=(0.1,0.9), \quad q=(0.9,0.1), \quad r=(0.5,0.5). $$ Then $$D_B(p,q) = -\ln(2\sqrt{0.09}) \approx 0.51$$ but $$D_B(p,r)=D_B(r,q)=-\ln(\sqrt{0.05}+\sqrt{0.45})\approx 0.11. $$

In general, I hack together a simple R script when searching for such counterexamples. (Or when I have a hunch and want to test it before thinking deeply about it. "Computers are cheap, and thinking hurts.") In the present case, a script like the following quickly points us in the right direction:

nn <- 2
normalize <- function(xx) xx/sum(xx)
DB <- function(pp,qq) -log(sum(sqrt(pp*qq)))

while ( TRUE ) {
    pp <- normalize(runif(nn))
    qq <- normalize(runif(nn)) 
    rr <- normalize(runif(nn))
    if ( DB(pp,rr) > DB(pp,qq)+DB(qq,rr) ) {
        cat(pp,"\n",qq,"\n",rr,"\n")
        break
    }
}

Related Solutions

Solved – Bhattacharyya distance for histograms

You could use the Earth-Movers-Distance (EMD), which takes into account a ground-distance between the bins and solves a transportation problem (basically, and hence the name: one histogram is a set of piles of earth, one a set of holes and you want to fill the holes as efficiently as possible). Afaik it is quite a standard distance comparing images in content-based image retrieval.

Solved – Bhattacharyya distance for three histograms

Find similarity between RGB values separately, like I did.

float CTracker::calculate_similarity_colour (BlobPast* prev_blob, PositionArrays* curr_blob, int k)
{
    float total_red = 0;
    float total_blue = 0;
    float total_green = 0;
    float similarity = 0;
    float S_D_R = 0;
    float S_D_G = 0;
    float S_D_B = 0;

    for (int i = 0; i < 256; i++)
    {
        if ((prev_blob->colour_red[i] != NULL ) &&  (curr_blob->colours[i].colour_red != NULL))
        {
            S_D_R = sqrt (abs(  (prev_blob->colour_red[i]) *  (curr_blob->colours[k].colour_red[i])));
            total_red       = total_red + S_D_R;

            S_D_G = sqrt( abs (  ( prev_blob->colour_green[i]) *  (curr_blob->colours[k].colour_green[i])));
            total_green     = total_green + S_D_G;

            S_D_B  = sqrt( abs(  (prev_blob->colour_blue[i]) *   (curr_blob->colours[k].colour_blue[i])));
            total_blue = total_blue + S_D_B;
        }
    }

    similarity = (0.4 * total_red) + (0.3 * total_green) + (0.3 * total_blue);
    return similarity;
}

//Note the reason my similarity would be between 0-1 is I divided each each histograms each value by total number of pixels (i.e., normalizing)

Best Answer

Related Solutions

Solved – Bhattacharyya distance for histograms

Solved – Bhattacharyya distance for three histograms

Related Question