Optimal transport and total variation distance

duality-theoremsoptimal-transportprobability theorytotal-variation

I have a question regarding the following concept equating total variation distance with a particular case of optimal transport.

enter image description here

I don't understand why equality (6.11) holds. We know by Kantorovich duality that the RHS is equal to $$2 \sup_{\phi \text{ Lipschitz} \\ |\phi|_{\text{Lip}} \leq 1} \int \phi d\mu – \int \phi d\nu \equiv f(\mu, \nu)$$ as a function is $c-$convex for a distance function $c = 1(x \ne y)$ if and only if it is $1-$Lipschitz.

As for the total variation, it is defined as
$$T(\mu, \nu) \equiv \sup_{A \in \mathcal{F}} |\mu(A) – \nu(A)|$$
where $\mathcal{F}$ is our $\sigma-$algebra on whichever Polish space we're working with. It is obvious that for $\phi(x) = 1_A(x)$, we have that $\phi$ is $1-$Lipschitz and therefore $T(\mu, \nu) \leq f(\mu, \nu)$. I'm confused why we need the $2$ here, and how the other direction of the inequality would be shown?

Specifically, I need that for any $1-$Lipschitz function, there exists a set $A \in \mathcal{F}$ such that $|\mu(A) – \nu(A)| \ge 2 \int \phi d\mu – \int \phi d \nu$, but I have no idea how to get this right. Any help would be massively appreciated.

(The excerpt is from Villani (2009))

Best Answer

I think there is some difference in definition. Look the lecture notes Probability in High Dimensions by Van-Handel. In example 4.14 the author writes:

$$ ||\mu - \nu||_{TV} = \inf_{M\in\mathcal C(\mu,\nu)}M(X\neq Y) $$

And he then goes on to prove this.

What might be happening is a different definition of the T.V metric.

Indeed, we can prove that using your definition of TV, the equality $$||\mu - \nu||_{TV} = \sup_A|\mu(A) - \nu(A)| = 2\inf P[X\neq Y]$$

Would be inconsistent. Note:

$$\mu(A) - \nu(A) = P[X \in A] - P[Y \in A] = $$ $$= P[X \in A, X=Y] - P[X \in A,X\neq Y]+ P[Y \in A,X=Y] - P[Y \in A,X\neq Y] = $$ $$ = P[X \in A, X\neq Y] - P[Y \in A, X \neq Y] \leq P[X\neq Y] $$ Therefore, $$\sup_A|\mu(A) - \nu(A)| \leq P[X\neq Y]$$ Hence, $\sup_A|\mu(A) - \nu(A)|>0 \implies 2P[X\neq Y]> \sup_A|\mu(A)-\nu(A)|$

Related Question