Let $\mu,\nu$ be two probability measures on the space $\mathbb{R}^n$, and let $\Pi(\mu,\nu)$ be the space of joint probability measures with marginals $\mu$ and $\nu$. After a discretisation of space (and entropic regularisation), one is able to compute the standard 2-Wasserstein distance
$$ W_2(\mu,\nu):= \inf_{\pi\in \Pi(\mu,\nu)} \Big( \int \|x-y\|^2 \pi(dxdy)\Big)^{1/2}, $$
using Sinkhorn algorithm (see https://arxiv.org/abs/1306.0895 ). Can anyone explain or point me to a reference for an algorithm for numerically computing the weighted Wasserstein :
$$ W_{A}(\mu,\nu):= \inf_{\pi\in \Pi(\mu,\nu)} \Big( \int \langle A(x-y) , x-y \rangle \pi(dx,dy) \Big)^{1/2} $$
Here $A\in \mathbb{R}^{n\times n} $ is positive definite (symmetric invertible). Or better of the entropy regularised version :
$$ W_{A,\epsilon}(\mu,\nu):= \inf_{\pi\in \Pi(\mu,\nu)} \Big( \int \langle A(x-y) , x-y \rangle \pi(dx,dy) \Big)^{1/2}+ \epsilon \int \pi(x,y) \log \pi(x,y) dxdy $$
Best Answer
Just to support the response of Giacomo, you can use a generic Optimal Transport solver for the cost $c(x, y) = \langle A(x-y), x-y \rangle$. For instance, supposing empirical measures $\hat{\mu}$ and $\hat{\nu}$, with support $X = [x_{1}, x_{2}, \cdots, x_{n}]$ (resp. $Y$) and sample weights $\{a_{i}\}_{i=1}^{n}$ (resp. $\{b_{j}\}_{j=1}^{m}$), these are defined as,
$$\hat{\mu}(x) = \sum_{i=1}^{n}a_{i}\delta(x-x_{i}),$$ $$\hat{\nu}(y) = \sum_{i=1}^{m}b_{j}\delta(y-y_{j}),$$
You can check that using these definitions, the set $\Pi(\hat{\mu}, \hat{\nu})$ is finite (so we may substitute the inf by a min), and the optimization problem becomes:
$$\pi^{\star} = \underset{\pi \in \Pi(\hat{\mu}, \hat{\nu})}{\text{min}}\sum_{i=1}^{n}\sum_{j=1}^{m}C_{ij}\pi_{ij}$$
Now, as you may be aware, this is a (finite) linear program on the entries $\pi_{ij}$, for which there are known algorithms for solving it (e.g. Dantzig's algorithm). Sinkhorn's algorithm is a clever solution for approximating the solution $\pi^{\star}$. As Giacomo mentioned, "Computational Optimal Transport" presents a broad discussion on the subject.
Now, answering your question concerning the cost function, you can easily implement the cost on a programming language of your choice. All you need to note is that,
\begin{align*} C_{ij} &= c(x_{i},y_{j})\\ &= \biggr(A(x_{i}-y_{j})\biggr)^{T}(x_{i}-y_{j})\\ &= (x_{i}-y_{j})^{T}A^{T}(x_{i}-y_{j})\\ &= (x_{i}-y_{j})^{T}A(x_{i}-y_{j}) \end{align*}
Personally, I recommend using Python, as there are good toolboxes for Optimal Transport, such as Python Optimal Transport (POT).
Side-topic: Python implementation
You may be wondering why I wrote $C_{ij} = (x_{i}-y_{j})^{T}A(x_{i}-y_{j})$. Well, the lib Scipy of Python has a high-performance function for computing pairwise distances, called cdist. Reading the documentation of this function, note that the expression for $C_{ij}$ looks a lot like the Mahalanobis distance. Indeed, $A$ in the latter expression is the $VI$ argument in the function. Thus, you can automatically program your cost function as,