Optimal Transport – How is This Transport Map Borel Measurable?

calculus-of-variationsmeasurable-functionsmeasure-theoryoptimal-transportationreal-analysis

I'm reading Theorem 1.17. and its proof at page 14 of Santambrogio's Optimal transport for applied mathematicians. The content is not hard but a little bit long (because of related detail). Please save your time by scrolling down to Theorem 1.17. if you are familiar with optimal transport.


Let $X=Y=\Omega$ be a compact subset of $\mathbb{R}^d$. Let the cost function $c:X \times Y \to [0, \infty)$ be of the form $c(x, y)=h(x-y)$ for a strictly convex function $h: \mathbb R^d \to [0, \infty)$. By Kantorovich duality, there exist an optimal transport plan $\gamma$ and a Kantorovich potential $\varphi:X \to \mathbb R \cup \{-\infty\}$ such that
$$
\varphi(x)+\varphi^c(y) \leq c(x, y) \text { on } \Omega \times \Omega \text { and } \varphi(x)+\varphi^c(y)=c(x, y) \text { on } \operatorname{spt}(\gamma) .
$$

Here $\varphi^c:Y \to \mathbb R \cup \{-\infty\}$ is the $c$-conjugate of $\varphi$ and $\operatorname{spt}(\gamma)$ the support of $\gamma$. Because $c$ is Lipschitz, $\varphi, \varphi^c$ are real-valued and Lipschitz. Let us fix a point $\left(x_0, y_0\right) \in \operatorname{spt}(\gamma)$. One may deduce from the previous computations that
$$
x \mapsto \varphi(x)-c\left(x, y_0\right) \quad \text { is minimal at } x=x_0.
$$

If $\varphi$ and $h$ are differentiable at $x_0$ and $x_0-y_0$, respectively, and $x_0 \notin \partial \Omega$, one gets $\nabla \varphi\left(x_0\right)=\nabla h\left(x_0-y_0\right)$. This works if the function $h$ is differentiable, if it is not we shall write $\nabla \varphi\left(x_0\right) \in \partial h\left(x_0-y_0\right)$ (using the subdifferential of $h$, see Box 1.12). For a strictly convex function $h$ one may inverse the relation passing to $(\nabla h)^{-1}$ thus getting
$$
x_0-y_0=(\nabla h)^{-1}\left(\nabla \varphi\left(x_0\right)\right) .
$$

Notice that the expression $(\nabla h)^{-1}$ makes sense for strictly convex functions $h$, thanks to the considerations on the invertibility of $\partial h$ in Box 1.12.

This formula gives the solution to the transport problem with this cost, provided $\varphi$ is differentiable a.e. with respect to $\mu$. This is usually guaranteed by requiring $\mu$ to be absolutely continuous with respect to the Lebesgue measure, and using the fact that $\varphi$ may be proven to be Lipschitz. Then, one may use the previous computation to deduce that, for every $x_0$, the point $y_0$ (whenever it exists) such that $\left(x_0, y_0\right) \in \operatorname{spt}(\gamma)$ is unique (i.e. $\gamma$ is of the form $\gamma_{\mathrm{T}}:=(\mathrm{id}, \mathrm{T})_{\#} \mu$ where $\mathrm{T}\left(x_0\right)=y_0$). Moreover, this also gives uniqueness of the optimal transport plan and of the gradient of the Kantorovich potential.

Box 1.12.

  • For every convex function $f: \mathbb{R}^d \rightarrow \mathbb{R} \cup\{+\infty\}$ we define its subdifferential at $x$ as the set
    $$
    \partial f(x)=\left\{p \in \mathbb{R}^d: f(y) \geq f(x)+p \cdot(y-x) \forall y \in \mathbb{R}^d\right\}.
    $$

    It is possible to prove that $\partial f(x)$ is never empty if $x$ lies in the interior of the set $\{f<+\infty\}$. At every point where the function $f$ is differentiable, then $\partial f$ reduces to the singleton $\{\nabla f\}$.
  • If $h$ is strictly convex then $\partial h$, which is a multi-valued map, can be inverted and is uni-valued, thus getting a map $(\partial h)^{-1}$, that should use in the statement of Theorem $1.17$ instead of $(\nabla h)^{-1}$.

We may summarize everything in the following theorem:

Theorem 1.17. Given $\mu$ and $v$ probability measures on a compact domain $\Omega \subset \mathbb{R}^d$ there exists an optimal transport plan $\gamma$ for the cost $c(x, y)=h(x-y)$ with h strictly convex. It is unique and of the form (id, $T)_{\#} \mu$, provided $\mu$ is absolutely continuous and $\partial \Omega$ is negligible. Moreover, there exists a Kantorovich potential $\varphi$, and $\mathrm{T}$ and the potentials $\varphi$ are linked by
$$
\mathrm{T}(x) = x-(\nabla h)^{-1}(\nabla \varphi(x)) .
$$

Proof.

  • The previous theorems give the existence of an optimal $\gamma$ and an optimal $\varphi$. The previous considerations show that if we take a point $\left(x_0, y_0\right) \in \operatorname{spt}(\gamma)$ where $x_0 \notin \partial \Omega$ and $\nabla \varphi\left(x_0\right)$ exists, then necessarily we have $y_0=x_0-(\nabla h)^{-1}\left(\nabla \varphi\left(x_0\right)\right)$.

  • The points $x_0$ on the boundary are negligible by assumption. The points where the differentiability fails are Lebesgue-negligible by Rademacher's theorem. Indeed, $\varphi$ shares the same modulus of continuity of $c$, which is a Lipschitz function on $\Omega \times \Omega$ since $h$ is locally Lipschitz continuous and $\Omega$ is bounded. Hence, $\varphi$ is also Lipschitz.

  • From the absolute continuity assumption on $\mu$, these two sets of "bad" points (the boundary and the nondifferentiability points of $\varphi$ ) are $\mu$-negligible as well. This shows at the same time that every optimal transport plan is induced by a transport map and that this transport map is $x \mapsto x-(\nabla h)^{-1}(\nabla \varphi(x))$. Hence, it is uniquely determined (since the potential $\varphi$ does not depend on $\gamma$ ). As a consequence, we also have uniqueness of the optimal $\gamma$.


My question I'm fine with the fact that there is a $\mu$-null Borel subset $N$ of $\Omega$ such that the gradient $\nabla \varphi: N^c \to \mathbb R^d$ is Borel measurable. Of course, the transport map $T$ must be Borel measurable. However, it's possible that a non-measurable map has a measurable graph.

Could you elaborate on how the inverse $(\nabla h)^{-1}$ is Borel measurable?

Best Answer

$\newcommand\p\partial\newcommand{\R}{\mathbb R}\newcommand\ep\varepsilon$Let $h\colon\R^d\to\R$ be a strictly convex function.

Consider the set $S:=2^{\R^d}$ of all subsets of $\R^d$ endowed with the Hausdorff distance $d_H$. As usual, let $\p h(x)$ denote the subdifferential of the function $h$ at a point $x\in\R^d$; we will identify $(\R^d)^*$ with $\R^d$. Thus, we have the map \begin{equation*} \R^d\ni x\mapsto\p h(x)\in R, \tag{1}\label{1} \end{equation*} where $R:=\{\p h(x)\colon x\in\R^d\}\subseteq S$.

It suffices to prove

Proposition 1: The map $\p h$ in \eqref{1} is invertible and its inverse is continuous.

Proof: Using horizontal shifts, without loss of generality (wlog) it is enough to show that, for any sequence $(x_k)$ in $\R^k$,
\begin{equation*} \text{if $d_H(\p h(x_k),\p h(0))\to0$ then $x_k\to0$.} \tag{2}\label{2} \end{equation*}

By subtracting an appropriate affine function from $h$, wlog we may and will assume that \begin{equation*} h(0)=0\quad\text{and}\quad 0\in\p h(0),\quad\text{so that}\quad h\ge0. \end{equation*}

To obtain a contradiction, suppose that \eqref{2} is false. Then there exist a real $\ep>0$, a sequence $(x_k)$ in $\R^d$, and a sequence $(a_k)$ in $\R^d$ such that for all $k$ \begin{equation*} a_k\in\p h(x_k)\quad\text{and}\quad |x_k|\ge\ep, \quad\text{whereas}\quad a_k\to0. \end{equation*}

Since $h$ is convex and $a_k\in\p h(x_k)$, for all $k$ we have $0=h(0)\ge h(x_k)+a_k\cdot(0-x_k)$ and hence \begin{equation*} 0\le h(x_k)\le a_k\cdot x_k, \tag{3}\label{3} \end{equation*} with $\cdot$ standing for the dot product. Let also $|\cdot|$ denote the Euclidean norm. Letting now \begin{equation*} y_k:=\frac\ep{|x_k|}\,x_k=\frac\ep{|x_k|}\,x_k+\Big(1-\frac\ep{|x_k|}\Big)0 \end{equation*} we have $|y_k|=\ep$ and, using the convexity of $h$ again and looking back at \eqref{3}, we get \begin{equation*} 0\le h(y_k)\le\frac\ep{|x_k|}\,h(x_k) \le\frac\ep{|x_k|}\,a_k\cdot x_k =a_k\cdot y_k\to0, \end{equation*} so that $h(y_k)\to0$. Passing to subsequences, wlog we have $y_k\to y$ for some $y\in\R^d$ with $|y|=\ep$, so that $y\ne0$.

On the other hand, the convex function $h\colon\R^d\to\R$ is necessarily continuous. So, $h(y)=\lim_k h(y_k)=0$. So, for all $t\in[0,1]$, once again by the convexity of $h$, we have $0\le h(ty)\le(1-t)h(0)+th(y)=0$, so that $h(ty)=0$ for all $t\in[0,1]$, which contradicts the strict convexity of $h$. $\quad\Box$