Solved – Kullback-Leibler distance for comparing two distribution from sample points

density functiondistancekullback-leiblersamplewasserstein

I have two data samples of a value and I want to compute some distance which would represent the difference in their distribution.
I read about Kullback-Leibler distance which could be used for comparing two distributions.

Would it be the right way if I compute the density of both samples and pass it as input to compute KL distance?

Best Answer

The Kullback-Leibler divergence is not a distance: it is not even symmetric, and you could (and most likely will) get completely different results by orders of magnitudes depending on what is your reference measure.

The proper way of answering your question is to use the Wasserstein distance, in particular Wasserstein-2. This is a proper distance defined in the settings of theory of optimal transport. The Kantorovich formulation of optimal transport is shown below.

I will first detail the theoretical idea, then present one practical solution (not the only one) that can be easily implemented.

The basic idea is the following, given two measures $\mu$, $\nu$, you can quantify how 'close' they are by measuring how much kinetic energy it would take you to deform one to the other.

In other words, if you had to move the mass from $\mu$ to $\nu$ by hand, and the cost of transport of one unit of mass is proportional to the distance squared, you are trying to minimize the total cost of transport.

Call $\pi(x,y)$ the amount of mass moved from $x$ to $y$. Then your objective function is

$$ \min_{\pi \geq 0} \iint |x-y|^2 \pi(x,y) dx dy $$ subject to the constraints $$ \int \pi(x,y) dx = \nu(y) \quad \text{(all the mass at $y$ comes from somewhere)} $$ $$ \int \pi(x,y) dy = \mu(x) \quad \text{(all the mass at $x$ goes somewhere)} $$ How do we solve this in practice? Well this is merely a linear programming programming problem in infinite dimensions.

For instance, if you had iid data samples, $(x_i)_{i=1,..,n}$ and $(y_j)_{j=1,..,m}$, you are seeking an assignment $\pi_{ij}$ that minimizes the transport cost. One way is to solve the finite dimension linear program $$ \min_{\pi_{ij} \geq 0} \sum_{i,j} |x_i-y_j|^2 \pi_{ij} $$ subject to $$ \sum_i \pi_{ij} = \frac{1}{m} $$ $$ \sum_j \pi_{ij} = \frac{1}{n} $$ More references on computational optimal transport: https://arxiv.org/pdf/1803.00567.pdf