Different versions of the entropy term in the entropy-regularized Wasserstein distance

entropyoptimal-transportprobability distributionsprobability theorystatistics

\begin{equation}
\mathcal{W}_\epsilon(\alpha, \beta) = \min_{\pi\in \Pi(\alpha\beta)} \int c(x,y) \mathrm{d}\pi(x,y) + \epsilon H(\pi \| \alpha \otimes \beta)
\end{equation}

Cuturi (2013) introduced the entropy-regularized Wasserstein distance, or Sinkhorn distance, shown above, where $\epsilon $ is the regularization parameter and $H(\pi \| \alpha \otimes \beta)$ is the relative entropy, or KL-divergence, between the transport plan and the marginal probabilities.

But I have seen the $H(\cdot)$ term shown in two different ways, one with entropy and the other with relative entropy:

\begin{align}
H(\pi) &= \int \pi(x,y) \ln \pi(x,y) \\
H(\pi \| \alpha \otimes \beta) &= \int \ln \left(\frac{\mathrm{d}\pi (x,y)}{\mathrm{d}\alpha(x) \mathrm{d}\beta(y) } \right) \mathrm{d}\pi (x,y)
\end{align}

How are the last two lines equal or connected to each other? Obviously they're not the same, so why are there two different versions running around?

Best Answer

These two are actually equivalent up to a constant when $\pi$ is a coupling of $\alpha$ and $\beta$. I'll assume that $\pi,\alpha, \beta$ all have densities. We can then write:

$$ H(\pi||\alpha\otimes \beta) = \int\ln\left(\frac{d\pi}{d\alpha d\beta} \right)d\pi = \int \pi(x,y) \ln\left(\frac{\pi(x,y)}{\alpha(x)\beta(y)} \right) dx dy $$

Note that $\pi(x,y)$ is the density with respect to the Lebesgue measure, and the same can be said for $\alpha(x)$ and $\beta(y)$. Therefore:

$$ H(\pi||\alpha\otimes \beta) = \int\pi(x,y)\ln \pi(x,y) dx dy - \int\pi(x,y)\ln(\alpha(x))dxdy - \int\pi(x,y)\ln(\beta(y))dxdy =\\ = \int \pi(x,y) \ln\pi(x,y) dx dy - \int\alpha(x)\ln\alpha(x) dx -\int \beta(y) \ln \beta(y) dy = H(\pi) - H(\alpha) - H(\beta) $$

Since $\alpha$ and $\beta$ are fixed, we get $H(\pi) + C$, where $C$ is a constant.

Related Question