Can we bound the L1 distance between densities by Wasserstein distance of measures

lp-spacesmeasure-theoryprobability theorywasserstein

Let $\mu_1$ and $\mu_2$ be two probability measures over a closed interval $[a, b]$, with respective density functions $\phi_1$ and $\phi_2$. Is there a way to bound the $L^1$ distance of the densities by the Wasserstein distance of the probability measures?
More specifically; does there exist a $C > 0$ such that

$$ \int_a^b | \phi_1(x) – \phi_2(x) | dx \leq C \cdot W_1(\mu_1, \mu_2) $$

holds?

Apparently, we can bound the distance in the moments but this does not necessarily help us. I have the feeling that there may be a way to use the Kantorovich-Rubinstein duality theorem, however, I do not see how exactly.

Best Answer

This does not quite answer your question, as your question concerns $W_1$ instead of $W_2^2$, but since you have not gotten an answer so far i feel like it may be helpful to provide some related results.

The following is an excerpt from Lectures on Optimal Transport (Lecture 8 Section 1) by Ambrosio, Brué and Semola. It asserts that no such bound can hold for the $L_2$ distance between the densities and the $W_2^2$ distance between the associated measures:

Perhaps you can find a similar argument to show the same result for $W_1$. Once i find anything else relating to your problem, i will edit my answer to provide more information.

EDIT

I have now found a similar argument which shows that no such bound can hold for $W_1$: Let $\mu$ be supported on $[a,b-H]$ with density $\phi$ and for $h<H$ define the shifted measure $\mu_h$ to be the measure with density $\phi_h(x):=\phi(x-h)$. Then all such measures are concentrated on $[a,b]$. Note that just as in the excerpt i posted above, we have $W_1(\mu,\mu_h)=h$. Similarly, if we choose $\phi$ and $h$ such that $\text{supp}\phi\cap \text{supp}\phi_h=\emptyset$, then $$\Vert \phi-\phi_h\Vert_{L_1}=2\Vert\phi_h\Vert_{L_1}=2.$$ Now for any $C>0$ we can pick $h$ small enough such that $$\Vert \phi-\phi_h\Vert_{L_1}=2>CW_1(\mu,\mu_h)=C\cdot h$$ and hence there cannot exist a $C>0$ such that the desired inequality holds for all $\mu,\nu$ supported on $[a,b]$.

Related Solutions

[Math] 2-Wasserstein distance between empirical distributions

I solved the problem using the MOSEK solver and got the similar results.

K = 600 
D=25 
train = np.random.multivariate_normal(np.zeros(D), np.diag(np.ones(D)*1),size=K).T 
test = np.random.multivariate_normal(np.zeros(D), np.diag(np.ones(D)*4),size=K).T 
M = np.zeros((K,K)) 
for i in range(K):
        for j in range(K):
            M[i,j] = np.linalg.norm(train[:,i]-test[:,j],2) 
Tra = cp.Variable((K,K),nonneg=True) 
constraint = [Tra @ np.ones((K,1)) == np.ones((K,1))/K,
                  Tra.T @ np.ones((K,1)) == np.ones((K,1))/K] 
optprob = cp.Problem(cp.Minimize(cp.trace(Tra.T@M)), constraint) 
optprob.solve(solver=cp.MOSEK) 
print(np.trace(Tra.value.T@M))

The result is around 8.3 for 25 dimension which is not surprising since for the continuous random variables, the Wasserstein distance is a continous optimization problem over all the possible joint probability distributions; while for the empirical distribution, we in fact optimize over matirx $T$, which obviously has a higher optimal objective value than the continous case.

Convexity of Relative entropy for probability measures with no densities

Fix measures $\mu_0,\mu_1,\nu_0,\nu_1$ on the same probability space with $\mu_0\ll\nu_0$ and $\mu_1\ll\nu_1$, i.e. the interesting case. For $\alpha\in[0,1]$ let $\mu_\alpha=(1-\alpha)\mu_0+\alpha\mu_1$ and $\nu_\alpha=(1-\alpha)\nu_0+\alpha\nu_1$. Notice that $\mu_\alpha\ll\nu_\alpha$.

For the following argument, I prefer to work with random variables, so let $B_\alpha\in\{0,1\}$ be Bernoulli with success probability $\alpha$. Let $X_\alpha$ given $B_\alpha$ have the law $\mu_{B_\alpha}$, and let $Y_\alpha$ given $B_\alpha$ have the law $\nu_{B_\alpha}$.

Using this machinery, I can simply write $\mathcal H(X_\alpha\|Y_\alpha)=\mathcal H((1-\alpha)\mu_0+\alpha\mu_1\|(1-\alpha)\nu_0+\alpha\nu_1)$. The right hand side of the inequality is the conditional relative entropy of $X_\alpha$ given $B_\alpha$ and $Y_\alpha$ given $B_\alpha$. However, since $\mathcal H(B_\alpha\|B_\alpha)=0$ is clearly trivial, the conditional relative entroy equals the joint relative entropy, that is, we have $\mathcal H(B_\alpha,X_\alpha\|B_\alpha,Y_\alpha)=(1-\alpha)\mathcal H(X_0\|Y_0)+\alpha\mathcal H(X_1\|Y_1)$, which we verify by establishing that the Radon-Nikodym derivative of $(B_\alpha,X_\alpha)$ with respect to $(B_\alpha,Y_\alpha)$ coincides with the $(X_0,Y_0)$-derivative on $B_\alpha=0$ and with the $(X_1,Y_1)$-derivative on $B_\alpha=1$.

Unfortunately, the conditional relative entropy is usually only defined for finite supports, using kernels or under other restrictions, since it seems that we may need access to some kind of conditional distribution. However, this is not true, the conditional relative entropy can be defined in general, as discussed here, namely using the chain rule for the definition. Non-negativity is also shown here, which amounts to a straightforward application of Jensen's inequality.

Best Answer

EDIT

Related Solutions

[Math] 2-Wasserstein distance between empirical distributions

Convexity of Relative entropy for probability measures with no densities

Related Question