Solved – How to measure similarity of bivariate probability distributions

density functiondistance-functionskernel-smoothingsimilarities

I have three different distributions of 2D data:

Example 1: Three different distributions of 2D data points

or

Example 2: Three different distributions of 2D data points

Now I like to know whether distribution two is more similar to distribution one (2 to 1) than distribution three is to distribution one (3 to 1)? What is the proper way to measure those similarities (and preferably express them in a single number)?

What I did / thought of so far:

  1. As some kind of approximation for a similarity measure I used bounded bivariate kernel density estimation in a first step and then correlated the resulting PDFfs.
    However, this doesn't seem to be the most apprpriate way, since large regions of the PDFs are highly correlated (e.g., all of the low probability regions are ~0).

  2. I have looked at using the Two-sample Kolmogorov–Smirnov test for 2D distributions; however, this only tells me whether the two distributions are significantly different and does not provide a likelihood measure that allows me to say that the data was better predicted by one distribution or the other.

  3. Another method I thought of was fitting a curve to the data and simply measure the euclidean distance between the curves. However, here I don't know the proper way to fit a curve to 2D data. Besides, if I manage to fit a curve how do I determine corresponding points on the curves to measure the distance.

Best Answer

I suggest using the Jensen-Shannon divergence (JSD). For distributions $P$ and $Q$ it is given by $$D_\text{JS}[P, Q] = \frac{1}{2} D_\text{KL}[P \mid\mid M] + \frac{1}{2} D_\text{KL}[Q \mid\mid M],$$

where $M = \frac{1}{2}(P + Q)$ and $D_\text{KL}$ is the Kullback-Leibler divergence. Its advantages over other divergences are that it's symmetric, $\sqrt{D_\text{JS}[P, Q]}$ is a proper metric, and it's fairly intuitive because of its connection to mutual information*. It can also be generalized to more than two distributions if needed. For $P$ and $Q$ you can use the nonparametric estimates you already obtained.

*In a nutshell: Say I randomly pick $P$ or $Q$, both with 50% probability, draw one sample $x$ from it and give it to you. If you can tell whether $x$ came from $P$ or $Q$, there is a lot of information in $x$ about which distribution it belongs to. If you cannot tell, there is little information and the two distributions must be very similar. This is what the JSD measures.