Difference between Hellinger Distance and Wasserstein Distance between Two Distributions

measure-theoryoperations researchoptimal-transportprobability distributionsprobability theory

I really want to understand the difference between Hellinger Distance and Wasserstein Distance. I am from a Physics background. I am expecting an intuitive explanation for the difference. Is Wasserstein distance give more pieces of information than Hellinger distance? In what way, Hellinger distance is different from Wasserstein distance? Is there any relation between Hellinger distance and Wasserstein distance between two distributions?

Best Answer

Since you asked for an intuitive explanation this is going to be somewhat imprecise, but hopefully helpful.

The Hellinger distance is a bounded metric where you're kind of looking at the cumulative difference in density (of two probability measures), over all points in a probability space.

So lets say we have two probability measures with densities $f$ and $g$, where $f$ has support on $[0,1]$ and g has support on $[x, x+1]$ with $x\in\mathbb{R}$.

For $x=0$ the Hellinger and any Wasserstein distance between our distributions is zero. As we increase $x$, the Hellinger distance increases until $x>1$, then it just stays $1$. The Wasserstein distance on the other hand keeps on increasing as $x$ increases.

The Wasserstein distance can be intuitively seen as mass times the distance you displace the mass.

If this explanation is too handwavy, I can be more specific!