Gluing lemma in optimal transport

optimal-transportprobability theory

This thread is meant to record a question that I feel interesting during my self-study. I'm very happy to receive your suggestion and comments.

Gluing Lemma: Let $X,Y,Z$ be Polish spaces and $\mathcal P(X), \mathcal P(Y), \mathcal P(Z)$ the spaces of Borel probability measures on $X, Y, Z$ respectively. Let $\mu \in \mathcal P(X), \nu \in \mathcal P(Y), \omega \in \mathcal P(Z)$. Let $\pi_1 \in \Pi(\mu, \nu)$ and $\pi_2 \in \Pi(\nu, \omega)$. Let $P^{X \times Y}$ and $P^{Y \times Z}$ be the projection maps from $X \times Y \times Z$ to $X \times Y$ and $Y \times Z$ respectively. Then there is $\gamma \in \mathcal P(X \times Y \times Z)$ such that
$$
P^{X \times Y}_\sharp \gamma = \pi_1 \quad \text{and} \quad P^{Y \times Z}_\sharp \gamma = \pi_2.
$$

Best Answer

Disintegration of Measures: Let $X, Z$ be Polish spaces, $f:X \to Z$ measurable, and $\pi \in \mathcal P (X)$. Let $\omega = f_\sharp \pi \in \mathcal P(Z)$ and $X_z := f^{-1} (z) \subset X$ for $z \in Z$. Then there is a family $\{\pi(\cdot |z)\}_{z\in Z}$ of Borel probability measures such that $\pi(\cdot |z) \in \mathcal P(X_z)$ and $$ \int_X g \mathrm d \pi = \int_Z \int_{X_z} g(x) \mathrm d \pi (x|z) \mathrm d \omega (z) $$ for all measurable function $g:X \to [0, +\infty]$. Moreover, $\{\pi(\cdot |z)\}_{z\in Z}$ is unique up to $\omega$-a.e.

By disintegration of measures, $$ \pi_1 (A \times B) =\int_B \pi_1(A |y) \mathrm d \nu (y) $$ for some family $\{\pi_1(\cdot |y)\}_{y \in Y} \subset \mathcal P(Y)$. Similarly, $$ \pi_2 (B \times C) =\int_B \pi_2(C |y) \mathrm d \nu (y) $$ for some family $\{\pi_2(\cdot |y)\}_{y \in Y} \subset \mathcal P(Y)$. Define a non-negative finite Borel measure $\gamma$ such that $$ \gamma (A \times B \times C) := \int_B \pi_1(A |y) \pi_2(C |y) \mathrm d \nu (y). $$

It's clear that $\gamma$ is the required measure.

Related Solutions

Is there a stronger version of the Gluing Lemma

No, the condition $(\pi_y)_\#\gamma^+=(\pi_y)_\#\gamma^-$ is necessary: If you take the pushforward under $\pi_y$ of the identities $(\pi_{x,y})_\# \sigma = \gamma^+$ and $(\pi_{y,z})_\# \sigma = \gamma^-$, you get $(\pi_y\circ \pi_{x,y})_\#\sigma=(\pi_y)_\#\gamma^+$ and $(\pi_y\circ \pi_{y,z})_\#\sigma=(\pi_y)_\#\gamma^-$. But $\pi_y\circ\pi_{x,y}=\pi_y\circ\pi_{y,z}=\pi_y$, so that the two left sides coincide.

Proof of Kantorovich Duality – Optimal Transport Theory

It's clear that $$ \sup _{\Phi_{c} \cap C_{b}} J(\varphi, \psi) \leq \sup _{\Phi_{c} \cap L^{1}} J(\varphi, \psi) \leq \inf _{\Pi(\mu, \nu)} I[\pi]. $$

So it remains to prove $$ \inf _{\Pi(\mu, \nu)} I[\pi] \le \sup _{\Phi_{c} \cap C_{b}} J(\varphi, \psi) . $$

To simplify notations, let $\varphi \oplus \psi: (x,y) \mapsto \varphi(x)+\psi(y)$. We have $$ \inf _{\pi \in \Pi(\mu, \nu)} I[\pi]=\inf _{\pi \in M_{+}(X \times Y)}\left(I[\pi]+\left\{\begin{array}{l} 0 \text { if } \pi \in \Pi(\mu, \nu) \\ +\infty \text { else } \end{array}\right)\right. $$ with $M_+(X \times Y)$ the space of non-negative Borel measures on $X\times Y$. Also, $$ \left\{\begin{array}{l} 0 \text { if } \pi \in \Pi(\mu, \nu) \\ +\infty \text { else } \end{array}\right\}=\sup \left \{ \int \varphi d \mu+\int \psi d \nu-\int \varphi \oplus \psi d \pi \,\middle\vert\, (\varphi, \psi) \in C_b(X) \times C_b(Y) \right \}, $$

It follows that $$ \begin{aligned} &\inf _{\pi \in \Pi(\mu, \nu)} I[\pi]\\ =&\inf _{\pi \in M_{+}(X \times Y)} \sup \left \{ \int_{X \times Y} c d \pi +\int_{X} \varphi d \mu+\int_{Y} \psi d \nu - \int_{X \times Y} \varphi \oplus \psi d \pi \,\middle\vert\, (\varphi, \psi) \in C_b(X) \times C_b(Y) \right\}. \end{aligned} $$

Let us first assume that $X, Y$ are compact.

Let $E:=C_{b}(X \times Y)$ be the set of all bounded continuous functions on $X \times Y$, equipped with its usual supremum norm $\|\cdot\|_{\infty}$.
Because $X,Y$ are compact, so is $X\times Y$. So $E$ coincides with the space $C_{0}(X \times Y)$ of all continuous functions vanishing at infinity on $X \times Y$.
By Riesz' theorem, the topological dual $E^*$ of $E$ can be identified with the space of regular Radon measures, $M(X \times Y)$, normed by total variation.
Moreover, a nonnegative linear form on $E$ corresponds with a regular nonnegative Borel measure.

Then we introduce $$ \begin{gathered} \Theta: u \in E \longmapsto\left\{\begin{array}{l} 0 &\text {if } u \geq-c, \\ +\infty &\text {else}. \end{array}\right.\\ \text{}\\ \Xi: u \in E \longmapsto\left\{\begin{array}{l} \int_{X} \varphi d \mu+\int_{Y} \psi d \nu &\begin{align*} &\text {if } u = \varphi \oplus \psi \text{ for}\\ & \text{some } (\varphi, \psi) \in C_b(X) \times C_b(Y) \end{align*},\\ +\infty &\text {else. } \end{array}\right. \end{gathered} $$

It's easy to verify that $\Theta, \Xi$ satisfy the condition of Fenchel-Rockafellar duality, so $$ \inf _{u\in E}[\Theta(u)+\Xi(u)] = \max _{\pi \in E^{*}}\left[-\Theta^{*}\left(-\pi\right)-\Xi^{*}\left(\pi\right)\right]. $$

It's clear that $$ \begin{align*} \inf _{u\in E}[\Theta(u)+\Xi(u)] &= \inf \left\{\int_{X} \varphi d \mu+\int_{Y} \psi d \nu \,\middle\vert\, (\varphi, \psi) \in C_b(X) \times C_b(Y) \text{ s.t. } \varphi \oplus \psi \geq -c \right\} \\ &=-\sup \left\{J(\varphi, \psi) \mid (\varphi, \psi) \in \Phi_{c}\cap C_{b}\right\}. \end{align*} $$

Next, we compute the Legendre-Fenchel transforms of $\Theta, \Xi$. First, for any $\pi \in E^*$, $$ \begin{aligned} \Theta^{*}(-\pi) =\sup _{u \in E}\left\{-\int u d \pi \,\middle\vert\, u \geq-c\right\} = \sup _{u \in E}\left\{\int u d \pi \,\middle\vert\, u \leq c\right\} . \end{aligned} $$

If $\pi$ is not nonnegative, then there exists a positive function $v \in E$ such that $\int v d \pi<0$. Then, the choice $u=\lambda v$, with $\lambda \rightarrow-\infty$, shows that the supremum is $+\infty$.
On the other hand, if $\pi$ is nonnegative, then the supremum is clearly $\int c d \pi$. This is because lower semi-continuous and bounded from below function is a limit of an increasing sequence of Lipschitz continuous functions.

Thus $$ \Theta^{*}(-\pi) = \begin{cases} \int c d \pi &\text {if } \pi \in M_{+}(X \times Y) \\ +\infty &\text {else}. \end{cases} $$

We also have $$ \begin{align*} \Xi^{*}(\pi) &= \sup_{u\in E} \left \{ \int ud\pi - \int \varphi d \mu- \int \psi d\nu \,\middle\vert\, u = \varphi \oplus \psi \text{ for some } (\varphi, \psi) \in C_b(X) \times C_b(Y) \right \} \\ &= \begin{cases} 0 &\text {if } \quad \forall(\varphi, \psi) \in C_b(X) \times C_b(Y) : \int \varphi \oplus \psi d \pi= \int \varphi d \mu+\int \psi d \nu \\ +\infty & \text {else} \end{cases} \\ &= \begin{cases} 0 &\text {if } \quad \pi \in \Pi(\mu, \nu) \\ +\infty & \text {else}. \end{cases} \end{align*} $$

It follows that $$ \max _{\pi \in E^{*}}\left[-\Theta^{*}\left(-\pi\right)-\Xi^{*}\left(\pi\right)\right] = \max _{\pi \in \Pi(\mu, \nu) \cap M_{+}(X \times Y)} -\int c d \pi = - \min _{\pi \in \Pi(\mu, \nu)} \int c d \pi. $$ Hence $$ -\sup \left\{J(\varphi, \psi) \mid (\varphi, \psi) \in \Phi_{c} \cap C_{b}\right\} = - \min _{\pi \in \Pi(\mu, \nu)} \int c d \pi. $$

$c$ is bounded.

Fix $$ \pi_* \in \operatorname{argmin}_{\pi \in \Pi(\mu, \nu)}I [\pi]. $$

Then $\pi_*$ is tight, so there are compact sets $X_n \subset X$ and $Y_n \subset Y$ such that $\mu(X_n^c), \nu(Y_n^c) \le \delta$. Let $Z_n := X_n \times Y_n$. Then $\pi (Z_n^c) \le 2\delta$. We define $\pi_n \in \mathcal P(Z_n)$ by $$ \pi_n := \frac{1_{Z_n}\pi_*}{\pi_* (Z_n)}. $$

Let $\mu_n := P_\sharp^{X_n} \pi_{n} \in \mathcal P(X_n)$ and $\nu_n := P_\sharp^{Y_n} \pi_{n} \in \mathcal P(Y_n)$ be marginals of $\pi_n$. Let $$ I_n[\pi] := \int_{Z_n} c d \pi \quad \forall \pi \in \Pi(\mu_n, \nu_n). $$

Fix $$ \tilde \pi_{n} \in \operatorname{argmin}_{\pi \in \Pi(\mu_n, \nu_n)} I_n [\pi]. $$

Define $\pi_{*n}$ by $$ \pi_{*n} := \pi_* (Z_n) \tilde \pi_{n} + 1_{Z_n^c} \pi_*. $$

It is easy to verify that $\pi_{*n} \in \Pi(\mu, \nu)$. We have $$ \begin{align} I[\pi_*] &\le I[\pi_{*n}] \\ &= \int_{X\times Y} c d \pi_{*n} \\ &= \pi_*(Z_n)\int_{Z_n} c d \tilde \pi_n + \int_{Z_n^c} c d \pi_* \\ &\le \pi_*(Z_n) I_n [\tilde \pi_n] + 2 \delta \|c\|_\infty \\ & \le I_n [\tilde \pi_n] + 2\delta \|c\|_\infty. \end{align} $$

Now we introduce $$ J_n: L^{1}(\mu_{n}) \times L^{1}(\nu_{n}) \to \mathbb R, (\varphi, \psi) \mapsto \int_{X_{n}} \varphi d \mu_{n}+\int_{Y_{n}} \psi d \nu_{n}. $$

Let $\Phi_{c,n}$ be the set of all $(\varphi, \psi) \in L^{1}(\mu_n) \times L^{1}(\nu_n)$ satisfying $\varphi(x)+\psi(y) \leq c(x, y)$ for $\mu_n$-almost all $x \in X_n$ and $\nu_n$-almost all $y \in Y_n$. By step 1, we know that $$ \inf_{\pi \in \Pi(\mu_n, \nu_n)} I_n [\pi] = \sup_{(\varphi, \psi) \in \Phi_{c,n}} J_n (\varphi, \psi). $$

In particular, there is $(\varphi_n, \psi_n) \in \Phi_{c,n}$ such that $$ J_n (\varphi_n, \psi_n) \ge \sup_{(\varphi, \psi) \in \Phi_{c,n}} J_n (\varphi, \psi) - \delta. $$

To ensure $\varphi_n (x)+\psi_n (y) \leq c(x, y)$ for all $x \in X_n$ and all $y \in Y_n$. We redefine $\varphi_n, \psi_n$ such that $\varphi_n, \psi_n$ take value $-\infty$ on their null sets respectively. WLOG, we assume $\delta <1$. Then $J_n (\varphi_n, \psi_n) \ge -1$. Then there is $(x_n, y_n) \in Z_n$ such that $$ \varphi_n(x_n)+\psi_n(y_n) \ge -1. $$

If we replace $(\varphi_n, \psi_n)$ by $(\varphi_n +s, \psi_n-s)$ for some real number $s$, we do not change the value of $J_n (\varphi_n, \psi_n)$, so the resulting couple is still admissible. By a proper choice of $s$, we can ensure $$ \varphi_n (x_n) \ge -\frac{1}{2} \quad \text{and} \quad\psi_n (y_n) \ge -\frac{1}{2}. $$

This implies for all $(x,y) \in Z_n$, $$ \varphi_n(x) \le c(x, y_n) - \psi_n(y_n) \le c(x, y_n) + 1/2,\\ \psi_n(y) \le c(x_n, y) - \varphi_n(x_n) \le c(x_n, y) + 1/2. $$

Define $\psi_n^c$ by $$ \psi_n^c (x) := \inf_{y\in Y_n} [c(x, y) - \psi_n(y)] \quad \forall x \in X_n. $$

Moreover, $$ \inf_{y\in Y_n} [c(x,y)-c(x_n, y)] - 1/2 \le \psi_n^c (x) \le c (x, y_n) - \psi_n (y_n) \le c(x, y_n) + 1/2. $$

It follows from $c$ is bounded that $\psi_n^c$ is bounded. Hence $\psi_n^c \in L^1(\mu)$. It follows from $\varphi_n(x)+\psi_n(y) \leq c(x, y)$ that $\varphi_n \le \psi_n^c$ on $X_n$. This implies $J_n (\psi_n^c , \psi_n) \ge J_n (\varphi_n, \psi_n)$. Similarly, we define $\psi_n^{cc}$ by $$ \psi_n^{cc} (y) := \inf_{x\in X_n} [c(x, y) - \psi_n^c (x)] \quad \forall y \in Y_n. $$

For all $y\in Y_n$, we have $$ \inf_{x\in X_n} [c(x,y)-c(x, y_n)] - 1/2 \le \psi_n^{cc} (y) \le c(x_n, y) - \psi_n^c (x_n) \le c(x_n, y) - \varphi_n (x_n) \le c(x_n,y)+1/2. $$

So $\psi_n^{cc}$ is also bounded and thus $\psi_n^{cc} \in L^1(\nu)$. Moreover, $$ \psi_n^c (x) + \psi_n^{cc} (y) = \psi_n^c (x) + \inf_{z \in X} [c(z, y) - \psi_n^c(z)] \le \psi_n^c(x) + [c(x, y) - \psi_n^c(x)] = c(x,y). $$

Then $(\psi_n^c, \psi_n^{cc}) \in \Phi_c$. Also $$ \psi_n^{cc} (y) = \inf_{x\in X_n} [c(x, y) - \psi_n^c(x)] \ge \inf_{x\in X_n} [c(x, y) - [c(c,y) - \psi_n (y)]] = \psi_n (y) \quad \forall y \in Y_n. $$

This implies $$ J_n (\psi_n^c , \psi_n^{cc}) \ge J_n(\psi_n^c, \psi_n) \ge J_n (\varphi_n, \psi_n). $$

In particular, $$ \psi_n^c (x) \ge - \|c\|_\infty -1/2 \quad \text{and} \quad \psi_n^{cc} (y) \ge - \|c\|_\infty -1/2. $$

Indeed, $$ \begin{align*} J(\psi_n^c, \psi_n^{cc}) &= \int_{X} \psi_n^c d \mu+\int_{Y} \psi_n^{cc} d \nu \\ &= \int_{X \times Y}[\psi_n^c (x) + \psi_n^{cc} (y)] d \pi_{*n}(x, y) \\ &= \pi_{*}[Z_n] \int_{Z_n}[ \psi_n^c (x) + \psi_n^{cc} (y)] d \tilde \pi_{n}(x, y) +\int_{Z_n^{c}}[\psi_n^c(x)+\psi_n^{cc}(y)] d \pi_{*}(x, y) \\ &\ge (1-2 \delta) \left (\int_{X_n} \psi_n^c d \mu_n+\int_{Y_n} \psi_n^{cc} d \nu_n \right )-(2\|c\|_{\infty}+1) \pi_{*}[Z_n^c] \\ &\geq(1-2 \delta) J_n( \psi_n^c , \psi_n^{cc} )-2(2\|c\|_{\infty}+1) \delta \\ &\geq(1-2 \delta) J_n({\varphi}_n, {\psi}_n)-2(2\|c\|_{\infty}+1) \delta \\ &\geq(1-2 \delta) \left ( \inf_{\pi \in \Pi(\mu_n, \nu_n)} I_n [\pi] - \delta \right)-2(2\|c\|_{\infty}+1) \delta \\ &= (1-2 \delta)( I_n [\tilde \pi_n] -\delta)-2(2\|c\|_{\infty}+1) \delta \\ &\ge (1-2 \delta)(I[\pi_{*}] -(2\|c\|_{\infty}+1) \delta)-2(2\|c\|_{\infty}+1) \delta. \end{align*} $$

The result then follows by taking the limit $\delta \to 0^+$.

Let us assume that $X, Y$ are not compact, and that $c$ is lower semi-continuous on $X \times Y$.

Best Answer

Related Solutions

Is there a stronger version of the Gluing Lemma

Proof of Kantorovich Duality – Optimal Transport Theory

Related Question