Solved – what does the Wasserstein distance between two distributions quantify

computational-statisticsmachine learningmathematical-statisticspythonscipy

I am trying to understand what exactly the distance between two distributions using Wasserstein distance means.

I have two samples coming from two distribution: a ground truth one and its empirical realization. I know that the Wasserstein distance can be used to quantify the difference between the two distributions. My question is when do we consider the distance between these distributions "small" enough? or what does this number mean ? say we obtain 0.25 for the distance. What does that tell us ?

I think the answer of this question comes down to understand what does the distance exactly quantify (and
this question goes beyond the simple interpretation of the definition :the minimum cost if we
want to obtain the first distribution by transporting the probability mass in second one )

I am including a python example here and I appreciate an answer with concrete examples

from scipy.stats import wasserstein_distance
wasserstein_distance([0, 1, 3], [5, 6, 8])

(note : the scipy implementation works only on 1d PDs)

Best Answer

Wasserstein (or EMD), once you multiply it by your bandwith, measures the "work" necessary to transform one distribution into another (by solving the optimal transport problem). Roughly that is the integral difference between the two distributions, multiplied by the distance between their centers (NOTE: this is an approximation only for the purpose of giving a simple explanation here, but Wassertein makes NO USE of centers/average of the distributions and IT DOES USE a distance matrix that is user-provided and can be asymmetric or use non-linear steps -- The figure attached makes use of a symmetric distance matrix built with linear steps equal to the bin size of the distributions).

The wikipedia page explains everything with adequate math definitions: en.wikipedia.org/wiki/Wasserstein_metric

Below you can see the metrics with respect the reference BOLD BLUE.

enter image description here

Related Question