On the optimal couplings in the Wasserstein metric

probabilityprobability theoryreal-analysis

The Wasserstein metric is a well known metric between two probability distributions, defined as an infimum over the set of couplings of two distributions. The coupling of the measures which attains this infimum is known as the optimal coupling of the measures.

Does anyone have any references or any ideas on explicitly describing this coupling, for a pair of two well known-distributions? The case of Dirac measures seems trivially easy, but apart from that I have been stuck.

Best Answer

Wasserstein metrics are a family, parametrized by $p\in [1,\infty]$. My answer focusses on the case $p=2$, which is the standard one for geometric purposes. Explicit computations are possible for various examples.

General metric spaces:

  1. for the pair $(\mu,\delta_x)$, any $\mu$, the coupling is unique hence optimal and the obvious one: $\mu\otimes\delta_x$.

  2. for $(\mu,\nu)$ with $\mu=\sum_i^m a_i \delta_{x_i}$ and $\nu=\sum_j^n b_j \delta_{x_j}$ (convex combinations of Dirac's), the set of all couplings $\pi$ between $\mu$ and $\nu$ is bijective to the set $B^{n,m}$ of $m\times n$ bistochastic matrices. For small $m,n$ it is a good exercise to compute the integral $\int d^2(x,y)d\pi$ and try to minimize over $B^{n,m}$.

Euclidean spaces:

  1. Gaussian OT for $\mu_1,\mu_2$ Gaussian measures on $\mathbb R^n$, it was shown in [A] that the optimal coupling is itself a Gaussian measure on $\mathbb R^{2n}$. Furthermore, the distance is given by $$W_2(\mu_1,\mu_2)=\sqrt{|m_1-m_2|^2 + \mathrm{tr}(M_1)+\mathrm{tr}(M_2)-2\mathrm{tr}\left[\big(\sqrt{M_1}M_2\sqrt{M_1}\big)^{1/2}\right]}$$ where $\mu_i$ has expectation $m_i$ and (positive definite) covariance matrix $M_i$.

  2. semi-discrete OT Let $K$ be a convex subset of $\mathbb R^n$ with Lebesgue measure $\lambda_n K=1$. For $(\lambda_n,\mu)$ with $\mu=\sum_{i=1}^Na_i\delta_{x_i}$, $N\in \mathbb N\cup \{+\infty\}$ purely atomic, there exist $N$ pairwise disjoint convex subsets $U_i$ of $K$ with $\lambda_n U_i=a_i$ and such that $\pi=\sum_{i=1}^N \mathbb 1_{U_i}\lambda_n \otimes\delta_{x_i}$. If $N$ is finite, $\{U_i\}_i$ is a Laguerre tessellation of $K$, see e.g. [B]. If $a_i=1/N$, then this tessellation is the Voronoi tessellation with centers $x_i$.

Beside these explicit examples there are very many others, treating more sophisticated distributions, usually on the real line. Furthermore, there exist a huge literature about characterizations of optimal transport maps, say $T$, by all sorts of methods (mostly, descriptions of $T$ as solution to PDEs). If an optimal transport map exists, then it is unique, and the unique optimal coupling satisfies $\pi=(\mu, T_*\mu)$.

[A] C.R. Givens and R.M. Shortt A class of Wasserstein metrics for probability distributions Michigan Math. J., 1984

[B] C. Lautensack and S. Zuyev. Random Laguerre tessellations. Adv. in Appl. Probab., 40(3):630–650, 2008.

Related Question