Brenier theorem on optimal transport

calculus-of-variationsconvex-analysisoptimal-transportprobability theory

This thread is meant to record a question that I feel interesting during my self-study. I'm very happy to receive your suggestion and comments.


Let $X,Y$ be topological spaces. Let $\mathcal P(X), \mathcal P(Y)$ be the spaces of all Borel probability measures on $X,Y$ respectively. Let $c: X \times Y \to [0, +\infty]$. Fix $\mu \in \mathcal P(X)$ and $\nu \in \mathcal P(Y)$.

  • $\Pi(\mu, \nu)$ is the set of $\pi \in \mathcal P(X \times Y)$ such that for all measurable subsets $A \subset X$ and $B \subset Y$,
    $$
    \pi (A \times Y) = \mu (A) \quad \text{and} \quad \pi (X \times B) = \nu (B).
    $$

  • $\Phi_{c}$ is the set of all $(\varphi, \psi) \in L_1 (\mu) \times L_1 (\nu)$ satisfying
    $$
    \varphi(x)+\psi(y) \leq c(x, y)
    $$

    for $\mu$-a.e. $x \in X$ and $\nu$-a.e. $y \in Y$.

  • Let
    $$
    \mathcal T := \{T:X \to Y \text{ measurable} \mid T_\sharp \mu = \nu\}.
    $$

  • For $\pi \in \mathcal P(X \times Y)$ and $(\varphi, \psi) \in L_1 (\mu) \times L_1 (\nu)$ and $T \in \mathcal T$, let
    $$
    \mathbb K (\pi) := \int_{X \times Y} c d \pi \quad \text{and} \quad \mathbb J(\varphi, \psi) := \int_{X} \varphi d \mu+\int_{Y} \psi d \nu \quad \text{and} \quad \mathbb M(T) := \int_{X} c (x, T(x)) \mathrm d \mu (x).
    $$

Brenier Theorem: Let $X = Y = \mathbb R^d$ and assume that $\mu, \nu$ both have finite second moment such that $\mu$ does not give mass to small sets (those ones with Hausdorff dimension are at most $d-1$). The cost function is $c(x,y) := \frac{1}{2} |x-y|^2$.

  1. For each $\pi^\dagger$ minimizing $\mathbb K$ over $\Pi(\mu, \nu)$, there is $T$ minimizing $\mathbb M$ over $\mathcal T$ such that $\pi^\dagger = (\operatorname{Id}, T)_\sharp \mu$.
  2. For each $T \in \mathcal T$, $T$ minimizes $\mathbb M$ over $\mathcal T$ if and only if $T =\nabla \varphi$ for some proper convex l.s.c. $\varphi \in L_1 (\mu)$.
  3. There is a unique (up to $\mu$-a.e.) minimizer of $\mathbb M$ over $\mathcal T$.

Best Answer

We need the following essential lemma, i.e.,

Knott-Smith optimality: Let $X = Y = \mathbb R^d$ and assume that $\mu, \nu$ both have finite second moment. Then $\pi^\dagger$ minimizes $\mathbb K$ over $\Pi(\mu, \nu)$ with cost $c(x,y) := \frac{1}{2} |x-y|^2$ if and only if there is $\varphi \in L_1 (\mu)$ convex l.s.c. such that $y \in \partial f (x)$ for $\pi$-a.e. $(x, y) \in X \times Y$. Moreover, the pair $(\varphi, \varphi^*)$ minimizes $\mathbb J$ over $\Phi$ where $\varphi^*$ is the convex conjugate and $$ \Phi := \{(\varphi', \psi') \in L_1(\mu) \times L_1 (\nu) \mid \varphi' (x) + \psi' (y) \ge \langle x, y \rangle\}. $$

  1. Let $\pi^\dagger$ minimize $\mathbb K$ over $\Pi(\mu, \nu)$.

By Knott-Smith optimality, there is a convex proper l.s.c. function $\varphi:X \to \mathbb R \cup \{+\infty\}$ such that $\varphi \in L_1 (\mu)$ and $y \in \partial \varphi (x)$ for $\pi^\dagger$-a.e. $(x, y) \in X \times Y$. Then $\varphi$ is differentiable $\mu$-a.e., and its gradient $\nabla \varphi$ is $\mu$-a.e. defined and Borel measurable. It follows that $y = \nabla \varphi (x)$ for $\pi$-a.e. $(x, y) \in X \times Y$. For $f \in L_1(\nu)$, we have $$ \begin{align} \int_Y f (y) \mathrm d \nu (y) &= \int_{X \times Y} f (y) \mathrm d \pi^\dagger (x,y) \\ &= \int_{X \times Y} f (\nabla \varphi (x)) \mathrm d \pi^\dagger (x,y) \\ &= \int_{X} f (\nabla \varphi (x)) \mathrm d \mu (x) \\ &= \int_{Y} f ( y) \mathrm d [(\nabla \varphi)_\sharp \mu] (y) \quad \text{because} \quad X=Y. \end{align} $$

It follows that $\nu = (\nabla \varphi)_\sharp \mu$. For $f \in L_1(\pi)$, we have $$ \begin{align} \int_{X \times Y} f (x, y) \mathrm d \pi^\dagger (x, y) &= \int_{X \times Y} f (x, \nabla \varphi (x)) \mathrm d \pi^\dagger (x,y) \\ &= \int_{X} f (x, \nabla \varphi (x)) \mathrm d \mu (x) \\ &= \int_{X \times Y} f (x, y) \mathrm d [(\operatorname{Id, \nabla \varphi})_\sharp \mu] (x). \end{align} $$

It follows that $\pi^\dagger = (\operatorname{Id}, \nabla \varphi)_\sharp \mu$.

2.

  • $\implies$ Notice that 1. implies $$ \inf_{\pi \in \Pi(\mu, \nu)} \mathbb K(\pi) = \inf_{T \in \mathcal T} \mathbb M(T). $$

We assume $T$ minimizes $\mathbb M$ over $\mathcal T$. Then $\pi' := (\operatorname{Id, T})_\sharp \mu$ minimizes $\mathbb K$ over $\Pi(\mu, \nu)$. By Knott-Smith optimality, there is a proper l.s.c. convex $\psi \in L_1 (\mu)$ such that $y = \nabla \psi (x)$ for $\pi'$-a.e. $(x, y) \in X \times Y$. Then $$ \begin{align} 0 &= \int_{X \times Y} |y - \nabla \psi (x)|^2 \mathrm d \pi'(x, y) \\ &= \int_{X \times Y} |y - \nabla \psi (x)|^2 \mathrm d [(\operatorname{Id}, T)_\sharp \mu] (x, y) \\ &= \int_{X} |T(x) - \nabla \psi (x)|^2 \mathrm d \mu (x). \end{align} $$

This implies $T = \nabla \psi$ $\mu$-a.e.

  • $\impliedby$Now we assume $T \in \mathcal T$ such that $T = \nabla \psi$ $\mu$-a.e. for some proper l.s.c. convex $\psi \in L_1 (\mu)$. Let $\pi' := (\operatorname{Id}, T)_\sharp \mu$. Then $$ \begin{align} &\int_{X \times Y} |y - \nabla \psi (x)|^2 \mathrm d \pi'(x, y) \\ = &\int_{X \times Y} |y - \nabla \psi (x)|^2 \mathrm d [(\operatorname{Id}, T)_\sharp \mu] (x, y) \\ = &\int_{X} |T(x) - \nabla \psi (x)|^2 \mathrm d \mu (x) = 0. \end{align} $$

Then $y = \nabla \psi (x) \in \partial \psi (x)$ for $\pi'$-a.e. $(x, y) \in X \times Y$. Then $\pi'$ minimize $\mathbb K$ over $\Pi(\mu, \nu)$ by Knott-Smith optimality.

  1. Let $T_1, T_2$ minimize $\mathbb M$ over $\mathcal T$.

By 2., $T_1 =\nabla \varphi_1$ and $T_2 =\nabla \varphi_2$ for some proper convex l.s.c. $\varphi_1, \varphi_2 \in L_1 (\mu)$. Then $\pi_1^\dagger := (\operatorname{Id}, \nabla \varphi_1)_\sharp \mu$ and $\pi_2^\dagger := (\operatorname{Id}, \nabla \varphi_2)_\sharp \mu$ minimize $\mathbb K$ over $\Pi(\mu, \nu)$. By Knott-Smith optimality, $y = \nabla \varphi_1 (x)$ for $\pi_1^\dagger$-a.e. $(x, y) \in X \times Y$ and $y = \nabla \varphi_2 (x)$ for $\pi_2^\dagger$-a.e. $(x, y) \in X \times Y$. It follows that $$ \begin{align} & \int_X \varphi_1 \mathrm d \mu + \int_Y \varphi^*_1 \mathrm d \nu \\ =& \int_{X \times Y} [\varphi_1 (x) + \varphi^*_1 (y)] \mathrm d \pi_2^\dagger (x, y)\\ =& \int_X [\varphi_1 (x) + \varphi^*_1 (\nabla \varphi_2 (x))] \mathrm d \mu (x) \end{align}. $$ and $$ \begin{align} & \int_X \varphi_2 \mathrm d \mu + \int_Y \varphi^*_2 \mathrm d \nu \\ =& \int_{X \times Y} [\varphi_2 (x) + \varphi^*_2(y)] \mathrm d \pi_2^\dagger (x, y) \\ =& \int_{X} [\varphi_2 (x) + \varphi^*_2( \nabla \varphi_2 (x))] \mathrm d \mu (x) \\ =& \int_{X} \langle x, \nabla \varphi_2 (x)\rangle \mathrm d \mu (x) \quad (\star) \end{align} $$

Here $(\star)$ follows from this result about subdifferential and convex conjugate. Also by Knott-Smith optimality, $$ \int_X \varphi_1 \mathrm d \mu + \int_Y \varphi^*_1 \mathrm d \nu = \int_X \varphi_2 \mathrm d \mu + \int_Y \varphi^*_2 \mathrm d \nu. $$

So $$ \int_X [\varphi_1 (x) + \varphi^*_1 (\nabla \varphi_2 (x))] \mathrm d \mu (x) = \int_{X} \langle x, \nabla \varphi_2 (x)\rangle \mathrm d \mu (x). $$

On the other hand, we always have $\varphi_1 (x) + \varphi^*_1 (\nabla \varphi_2 (x)) \ge \langle x, \nabla \varphi_2 (x)\rangle$. It follows that $\varphi_1 (x) + \varphi^*_1 (\nabla \varphi_2 (x)) = \langle x, \nabla \varphi_2 (x)\rangle$ for $\mu$-a.e. $x\in X$, so $\nabla \varphi_2 (x) \in \partial \varphi_1 (x)$ for $\mu$-a.e. $x\in X$. Hence $\nabla \varphi_2 (x) = \nabla \varphi_1 (x)$ $\mu$-a.e. This completes the proof.