Comparison between $L^1$ Wasserstein distance and total variation distance

optimal-transportprobability theoryreal-analysis

Wasserstein$-k$ distance between two probability measures $\mu,\nu$ on $\mathbb{R}^d$ is defined as:

$$W_k(\mu,\nu)=\left(\inf_{(X,Y)\in\mathcal{C}(\mu,\nu)}\mathbb{E}\left[\|X-Y \|^k \right]\right)^{1/k}$$

where $\mathcal{C}(\mu,\nu)$ is the set of all couplings of $\mu,\nu$.

Total variation distance between the same probability measures is defined as:

$$\|\mu-\nu\|_{\rm{TV}}=\max_{A\subset \mathbb{R}^d}|\mu(A)-\nu(A)|=\inf \{\mathbb{P}(X\neq Y): (X,Y)\,\text{is a coupling of $\mu$ and $\nu$.}\}$$

where the $\inf$ is again over all couplings of $\mu,\nu$.

I am trying to check if the total variation distance is smaller than Wasserstein-$1$ distance for any two probability measures.

$$\|\mu-\nu\|_{\rm{TV}}\leq \mathbb{P}(X\neq Y)$$

where $(X,Y)$ is any coupling and then I was trying to apply Markov's inequality but did not succeed. Any ideas?

Best Answer

As @lukanz mentioned, what you are trying to prove is not true. Here is an even simpler counterexample: consider the sequence $\delta_{\frac{1}{n}}$ (Dirac measures centred at $1/n$). It is easy to check that \begin{align} \lVert\delta_0 -\delta_{1/n} \rVert_{\mathrm{TV}}=1\, , \quad \forall \, n \geq 1 \, . \end{align} On the other hand, choosing the coupling $(X,X+1/n)$ where $X\sim \delta_0$, one has that \begin{align} W_1\left(\delta_0,\delta_{\frac1n}\right) \leq \frac1n\, . \end{align} So your inequality cannot hold true. The inequality the other way does hold true but with a constant and using a weighted version of the total variation distance. You can find the statement and proof in Chapter 6 of Villani's book Optimal Transport: Old and New.