Knott-Smith optimality

convex-analysisduality-theoremsoptimal-transport

This thread is meant to record a question that I feel interesting during my self-study. I'm very happy to receive your suggestion and comments. See: SE blog: Answer own Question and MSE meta: Answer own Question.


Let $X,Y$ be topological spaces. Let $\mathcal P(X), \mathcal P(Y)$ be the spaces of all Borel probability measures on $X,Y$ respectively. Let $c: X \times Y \to [0, +\infty]$. Fix $\mu \in \mathcal P(X)$ and $\nu \in \mathcal P(Y)$.

  • $\Pi(\mu, \nu)$ is the set of $\pi \in \mathcal P(X \times Y)$ such that for all measurable subsets $A \subset X$ and $B \subset Y$,
    $$
    \pi (A \times Y) = \mu (A) \quad \text{and} \quad \pi (X \times B) = \nu (B).
    $$

  • $\Phi_{c}$ is the set of all $(\varphi, \psi) \in L_1 (\mu) \times L_1 (\nu)$ satisfying
    $$
    \varphi(x)+\psi(y) \leq c(x, y)
    $$

    for $\mu$-a.e. $x \in X$ and $\nu$-a.e. $y \in Y$.

  • For $\pi \in \mathcal P(X \times Y)$ and $(\varphi, \psi) \in L_1 (\mu) \times L_1 (\nu)$, let
    $$
    \mathbb K (\pi) := \int_{X \times Y} c d \pi \quad \text{and} \quad \mathbb J(\varphi, \psi) := \int_{X} \varphi d \mu+\int_{Y} \psi d \nu .
    $$

Theorem: Let $X = Y = \mathbb R^d$ and assume that $\mu, \nu$ both have finite second moment. Then $\pi$ is a minimizer of $\mathbb K$ over $\Pi(\mu, \nu)$ with cost $c(x,y) := \frac{1}{2} |x-y|^2$ if and only if there is $\varphi \in L_1 (\mu)$ convex l.s.c. such that $y \in \partial f (x)$ for $\pi$-a.e. $(x, y) \in X \times Y$. Moreover, the pair $(\varphi, \varphi^*)$ is a minimizer of $\mathbb J$ over $\tilde \Phi$ where $\varphi^*$ is the convex conjugate and
$$
\tilde \Phi := \{(\tilde \varphi, \tilde \psi) \in L_1(\mu) \times L_1 (\nu) \mid \tilde \varphi (x) + \tilde \psi (y) \ge \langle x, y \rangle\}.
$$

Best Answer

Let $c_X = c_Y := \frac{1}{2} |\cdot|^2$. We need the following Lemma

Lemma: If $X,Y$ are Polish spaces, $c$ l.s.c., then $$ \min_{\pi \in \Pi(\mu, \nu)} \mathbb K (\pi) = \sup _{(\varphi, \psi) \in\Phi_c} \mathbb J (\varphi, \psi) . $$ If, moreover, there are $(c_X, c_Y) \in L_1(\mu) \times \in L_1 (\nu)$ such that $c (x, y) \le c_X(x)+c_Y(y)$ for $\mu$-a.e. $x\in X$ and $\nu$-a.e. $y\in Y$, then the maximization $$ \sup_{(\varphi, \psi) \in \Phi_c} J (\varphi, \psi) $$ has a solution of the form $(\psi, \psi^c)$ for some $\psi \in L_1(\mu)$ with $\psi^{cc} = \psi$. Here $\psi^c$ is the $c$-transform of $\psi$.

  • $\implies$

Let $\pi^\dagger$ minimize $\mathbb K$ over $\Pi(\mu, \nu)$. By Lemma, there is $(\varphi, \varphi^c)$ with $\varphi^{cc} = \varphi$ maximizing $\mathbb J$ over $\Phi_c$. Because $c(x, y) = c_X(x) + c_Y(y) - \langle x, y\rangle$, we have $(\tilde \varphi, \tilde \psi) := (c_X - \varphi, c_Y - \varphi^c)$ minimizes $\mathbb J$ over $\tilde \Phi$. It's can be shown that $\tilde \psi$ is the convex conjugate of $\tilde \varphi$, i.e., $\tilde \psi = \tilde \varphi^*$, and that $\tilde \varphi$ is the convex conjugate of $\tilde \varphi^*$, i.e., $\tilde \varphi = \tilde \varphi^{**}$. By this result, $\tilde \varphi$ is convex l.s.c. By Lemma, $$ \int_X \tilde \varphi (x) d \mu (x) + \int_Y \tilde \psi (y) d \nu (y)= \int_{X \times Y} \langle x, y\rangle d\pi^\dagger(x, y), $$ or equivalently $$ \int_{X \times Y} [\tilde \varphi (x) + \tilde \varphi^* (y) - \langle x, y\rangle] d\pi^\dagger(x, y) = 0. $$

We have $\tilde \varphi (x) + \tilde \varphi^* (y) - \langle x, y\rangle \ge 0$ for all $(x,y) \in X \times Y$, so $\tilde \varphi (x) + \tilde \varphi^* (y) - \langle x, y\rangle = 0$ for $\pi^\dagger$-a.e. $(x, y) \in X \times Y$. By this result, $y \in \partial \tilde \varphi (x)$ for $\pi^\dagger$-a.e. $(x, y) \in X \times Y$. It follows from $\tilde \varphi = \tilde \varphi^{**}$ that $\tilde \varphi$ is convex l.s.c.

  • $\impliedby$

Let $\pi^\dagger \in \Pi(\mu, \nu)$ and $\tilde \varphi \in L_1 (\mu)$ be such that $y \in \partial \tilde \varphi (x)$ for $\pi^\dagger$-a.e. $(x, y) \in X \times Y$. Then $\tilde \varphi (x) + \tilde \varphi^* (y) - \langle x, y\rangle = 0$ for $\pi^\dagger$-a.e. $(x, y) \in X \times Y$, so $$ \int_{X \times Y} [\tilde \varphi (x) + \tilde \varphi^* (y) - \langle x, y\rangle] d\pi^\dagger(x, y) = 0. $$

By this result, $\langle \cdot, \cdot \rangle$ is $\pi^\dagger$-integrable, so the map $(x, y) \mapsto \tilde \varphi (x) + \tilde \varphi^* (y)$ is also $\pi^\dagger$-integrable. Because $\tilde \varphi$ is $\mu$-integrable, so $\tilde \varphi^* \in L_1 (\nu)$. It follows that $$ \int_X\tilde \varphi d \mu + \int_Y \tilde \varphi^* d \nu = \int_{X \times Y}\langle x, y\rangle d\pi^\dagger(x, y). $$

On the other hand, $$ \int_X \varphi' d \mu + \int_Y \psi' d \nu \ge \int_{X \times Y} \langle x, y\rangle d\pi^\dagger(x, y) \quad \forall (\varphi', \psi') \in \tilde \Phi. $$

It follows that $(\tilde \varphi, \tilde \varphi^*)$ minimizes $\mathbb J$ over $\tilde \Phi$. Because $c(x, y) = c_X(x) + c_Y(y) - \langle x, y\rangle$, we have $(\varphi, \psi) := (c_X - \tilde\varphi, c_Y - \tilde \varphi^*)$ minimizes $\mathbb J$ over $\Phi_c$. This completes the proof.

Related Question