Simple bounds of the Wasserstein distance

optimal-transportprobability theory

I have some "proofs" of simple assertions regarding the Wasserstein distance (=the earth mover's distance), but I'm having trouble in making them precise. Can I know if the following assertions are true, and whether my "proofs" are sound?


Denote by $W_p$ the $p$Wasserstein distance.

Proposition 1. Let $M$ be a Polish metric space. Suppose $A, B \subseteq M$ are Borel measurable, and that there is a continuous bijection $f : A \rightarrow B$ which is bounded in the sense that $C := \sup_{x \in A} d_M(x, f(x)) < \infty$. Let $\mu$ be a Borel probability measure on $M$ with its support contained in $A$, and let $f_* \mu$ be the pushforward of $\mu$ along $f$. Then for any $p \ge 1$,
$$W_p(\mu, f_* \mu) \le C$$
"Proof". There is at least one transportation plan from $A$ to $B$, given by mapping each $x \in A$ to $f(x) \in B$ [what exactly, rigorously?]. The transportation cost $d_M(x, f(x))$ is bounded by the supremum of all such transportations, and thus we get the assertion.

Proposition 2. Let $M$ be a Polish metric space with a finite diameter $D := \sup_{x,y \in M} d_M(x,y)$. Fix a Borel probability measure $\mu$ on $M$. Let $f$ be a Lipschitz continuous function on $M$ with the Lipschitz constant $\alpha$, such that $\int_M f(x) \text{d} \mu(x) = 1$. Let $\mu_f$ be the Borel probability measure on $M$ given by taking $f$ as the probability density function. Then for any $p \ge 1$,
$$W_p(\mu_f, \mu) \le \alpha D^2 $$
"Proof". Note that $\mu$ is the 'uniform measure' with respect to itself, in that $\mu(A) = \int_A 1 \cdot \text{d}\mu(x)$. Also observe that by the Lipschitz property we have that $\sup_{x \in X} f(x) – \inf_{x \in X} f(x) \le \alpha D$. Thus the task is essentially about flattening a 'pile of earth' with bounded height fluctuations. The total amount of mass that must be transported is at most $\alpha D$, and the cost of transportation per unit mass is at most $D$ [but how do we realize an explicit transportation plan?]. Multiplying, we get the bound $\alpha D \cdot D = \alpha D^2$.


In my context, it's fine to prove the assertion for simpler cases such as when $M= \mathbb R^n$ for some $n> 0$. Correspondingly, the measure $\mu$ in the Proposition 2 may be taken to be the Lebesgue measure.

Best Answer

The joint probability distribution needed for the proof of Prop. 1 is the pushforward of $\mu$ under the map $x \mapsto (x,f(x))$ from $M$ to $M \times M$.

The heuristic argument given for proposition 2 only applies to $p=1$; for larger $p$ the bound is a bit different.

Since $\int f \, d\mu =1$, we have $$(*) \quad \forall x \in M \quad |f(x)-1| \le \alpha D \,.$$

The joint probability distribution needed for the proof of Prop. 2 is $\gamma$ defined as follows. Let $g(x)=f(x) \wedge 1$ and $ \, c=1- \int g(x) \, d\mu(x)$, so by $(*)$, $$(**) \quad 0 \le c \le \alpha D \,.$$ If $c=0$ then $\mu=\mu_f$, so we may assume that $c>0$. Then define $\gamma$ via integration against bounded continuous functions $h \in C(M \times M)$ by

$$\int h(x,y) \, d\gamma(x,y):= \int h(x,x) g(x) \, d\mu(x) \\ + c^{-1}\int \int h(x,y) (1 -g(x)) (f(y)-g(y)) \,d\mu(x) \, d\mu(y) \,. $$

It is easy to verify that $\gamma$ is a coupling of $\mu$ and $\mu_f$. Moreover, $$\int d(x,y)^p \, d\gamma(x,y) \\ \le c^{-1}\int \int D^p (1 -g(x)) (f(y)-g(y)) \,d\mu(x) \, d\mu(y) =c D^p \le \alpha D^{p+1} \,, $$ by $(**)$. Thus $$W_p(\mu,\mu_f) \le (\alpha D^{p+1})^{1/p} \,.$$