The Wasserstein Metric. Computational Optimal Transport. Weights.

algorithmsconvex optimizationentropyoptimal-transportregularization

Let $\mu,\nu$ be two probability measures on the space $\mathbb{R}^n$, and let $\Pi(\mu,\nu)$ be the space of joint probability measures with marginals $\mu$ and $\nu$. After a discretisation of space (and entropic regularisation), one is able to compute the standard 2-Wasserstein distance

$$ W_2(\mu,\nu):= \inf_{\pi\in \Pi(\mu,\nu)} \Big( \int \|x-y\|^2 \pi(dxdy)\Big)^{1/2}, $$

using Sinkhorn algorithm (see https://arxiv.org/abs/1306.0895 ). Can anyone explain or point me to a reference for an algorithm for numerically computing the weighted Wasserstein :

$$ W_{A}(\mu,\nu):= \inf_{\pi\in \Pi(\mu,\nu)} \Big( \int \langle A(x-y) , x-y \rangle \pi(dx,dy) \Big)^{1/2} $$

Here $A\in \mathbb{R}^{n\times n} $ is positive definite (symmetric invertible). Or better of the entropy regularised version :

$$ W_{A,\epsilon}(\mu,\nu):= \inf_{\pi\in \Pi(\mu,\nu)} \Big( \int \langle A(x-y) , x-y \rangle \pi(dx,dy) \Big)^{1/2}+ \epsilon \int \pi(x,y) \log \pi(x,y) dxdy $$

Best Answer

Just to support the response of Giacomo, you can use a generic Optimal Transport solver for the cost $c(x, y) = \langle A(x-y), x-y \rangle$. For instance, supposing empirical measures $\hat{\mu}$ and $\hat{\nu}$, with support $X = [x_{1}, x_{2}, \cdots, x_{n}]$ (resp. $Y$) and sample weights $\{a_{i}\}_{i=1}^{n}$ (resp. $\{b_{j}\}_{j=1}^{m}$), these are defined as,

$$\hat{\mu}(x) = \sum_{i=1}^{n}a_{i}\delta(x-x_{i}),$$ $$\hat{\nu}(y) = \sum_{i=1}^{m}b_{j}\delta(y-y_{j}),$$

You can check that using these definitions, the set $\Pi(\hat{\mu}, \hat{\nu})$ is finite (so we may substitute the inf by a min), and the optimization problem becomes:

$$\pi^{\star} = \underset{\pi \in \Pi(\hat{\mu}, \hat{\nu})}{\text{min}}\sum_{i=1}^{n}\sum_{j=1}^{m}C_{ij}\pi_{ij}$$

Now, as you may be aware, this is a (finite) linear program on the entries $\pi_{ij}$, for which there are known algorithms for solving it (e.g. Dantzig's algorithm). Sinkhorn's algorithm is a clever solution for approximating the solution $\pi^{\star}$. As Giacomo mentioned, "Computational Optimal Transport" presents a broad discussion on the subject.

Now, answering your question concerning the cost function, you can easily implement the cost on a programming language of your choice. All you need to note is that,

\begin{align*} C_{ij} &= c(x_{i},y_{j})\\ &= \biggr(A(x_{i}-y_{j})\biggr)^{T}(x_{i}-y_{j})\\ &= (x_{i}-y_{j})^{T}A^{T}(x_{i}-y_{j})\\ &= (x_{i}-y_{j})^{T}A(x_{i}-y_{j}) \end{align*}

Personally, I recommend using Python, as there are good toolboxes for Optimal Transport, such as Python Optimal Transport (POT).

Side-topic: Python implementation

You may be wondering why I wrote $C_{ij} = (x_{i}-y_{j})^{T}A(x_{i}-y_{j})$. Well, the lib Scipy of Python has a high-performance function for computing pairwise distances, called cdist. Reading the documentation of this function, note that the expression for $C_{ij}$ looks a lot like the Mahalanobis distance. Indeed, $A$ in the latter expression is the $VI$ argument in the function. Thus, you can automatically program your cost function as,

from functools import partial
from scipy.spatial.pairwise import cdist

A = ... # definition of your positive-definite matrix of shape (p, p)
a = ... # Sample weights of \hat{\mu}, with shape (n,)
b = ... # Sample weights of \hat{\mu}, with shape (m,)
X = ... # Support of \hat{\mu}, with shape (n, p)
Y = ... # Support of \hat{\nu}, with shape (m, p)
C = cdist(X, Y, metric='mahalanobis', VI=A) # Cost matrix with shape (n, m)
pi = ot_solver(a, b, C) # optimal transport plan with shape (n, m)
```
Related Question