Solved – Bhattacharyya distance for histograms

bhattacharyyadistancedistance-functionshistogramkullback-leibler

One of the ways to measure the similarity of two discrete probability distributions is the Bhattacharyya distance. In computer vision, for example, it is used to evaluate the degree of similarity between two histograms. However this metric treats all variables as they were isolated among each other; in other words if the histograms had 8 bins, colour values gathered in bin 8 are very close to those of bin 7 and far away from those of bin 1, but for the Bhattacharyya distance they are simply different.

Is there a metric that takes into account closeness between discrete variables?

I asked myself why in literature they don’t use a Kullback–Leibler distance for this task and I gave two answers: it is a real metric and 0-valued bins are not a problem. Are there any other reasons?

Best Answer

You could use the Earth-Movers-Distance (EMD), which takes into account a ground-distance between the bins and solves a transportation problem (basically, and hence the name: one histogram is a set of piles of earth, one a set of holes and you want to fill the holes as efficiently as possible). Afaik it is quite a standard distance comparing images in content-based image retrieval.