For $p=1$ one can bound the 1-Wasserstein metric by
$$|m-n| + \sqrt{\sum_{i=1}^{d} \left[
\left(
\sqrt{\lambda_i} - \sqrt{\gamma_i}\right)^2 + 2\sqrt{\lambda_i\gamma_i}(1-v_i\cdot u_i)
\right]}$$
when $\lambda_i$ and $\gamma_i$ are the $i^{th}$ eigen values of $U$ and $V$ respectively, $v_1,\ldots,v_d$ and $u_1,\ldots,u_d$ are the corresponding orthonormal basis of eigen-vectors.
See Chafai & Malrieu Lemma 2.4.
Although this bound seems close-in nature to the $p=2$ bound, I'm not sure if it is sharp.
I think I have an answer for the case p = 1, K = 2. I write "I think" because my computation does not coincide with the example values for $N=4$ posted earlier by OP in a comment, but I really cannot find any error in my proof, so I wanted to share it.
As already mentioned, we only need to consider permutation measures of the form $\mu = \sum_{i=1}^N \frac{1}{N} \delta_{i,\sigma(i)}$, since these are the extremal points of the set we optimize over.
For any such $\mu$, we can define a coupling $\pi$ by $\pi_{(i,\sigma(i)),(i,j)} = 1/N^2$ for $i,j \in \{1,...,N\}$ and $\pi_{(i,j),(i',j')} = 0$, if $i,j,i',j'$ are not of the form as before. That this is an admissible coupling in the definition of $W_1$ is trivial.
The corresponding "value" is
$$
\sum_{1\leq i,j,i',j' \leq N} ||(i,j)-(i',j')||_1 \pi_{(i,j),(i',j')} = \sum_{1 \leq i,j' \leq N} |\sigma(i) - j'| \frac{1}{N^2}.
$$
We further show that this value is minimal if $\mu$ is the monotonic or comonotonic measure, which will yield the claim. I only show this for the monotonic case, $\mu = \frac{1}{N}\sum_{i=1}^N \delta_{i,i}$. Then for any admissible coupling $\hat{\pi}$, by the constraints in the calculation of the Wasserstein distances, we see
\begin{align*}
\sum_{1\leq i,j,i',j' \leq N} ||(i,j)-(i',j')||_1 \hat{\pi}_{(i,j),(i',j')} =& \sum_{1\leq i',j' \leq N} \sum_{1 \leq i \leq N} ||(i,i)-(i',j')||_1 \hat{\pi}_{(i,i),(i',j')} \\ \geq& \sum_{1\leq i',j' \leq N} \sum_{1 \leq i \leq N} ||(i',i')-(i',j')||_1 \hat{\pi}_{(i,i),(i',j')}\\ =&\sum_{1\leq i',j' \leq N} |i'-j'| \frac{1}{N^2},
\end{align*}
where the inequality is elementwise and the last term corresponds to the value for the coupling $\pi$.
Note that this only shows that monotonic/comonotonic measures are maximizers, but not that these are the only maximizers.
Best Answer
Edit: (This question may be better suited for math.stackexchange.)
This is true. It suffices to show the map $\mu\rightarrow W_{1}(\mu,\nu)$ is convex. Let $\mu_{i}\in\mathcal{P}(\Xi)$ and $\psi^{*}_i\in\Gamma(\mu_{i},\nu)$ be optimal transport plans between $\mu_{i}$ and $\nu$ for each $i=1,2$. Then $\lambda\psi_{1}^{*}+(1-\lambda)\psi_{2}^{*}\in\Gamma(\lambda\mu_{1}+(1-\lambda)\mu_{2},\nu)$ and so
$W_{1}(\lambda\mu_{1}+(1-\lambda)\mu_{2},\nu)=\inf\limits_{\gamma\in\Gamma(\lambda\mu_{1}+(1-\lambda)\mu_{2},\nu)} \int d(x,y) d\gamma(x,y)\leq \\ \int d(x,y) d(\lambda\psi_{1}^{*}+(1-\lambda)\psi_{2}^{*})(x,y) = \lambda W_{1}(\mu_{1},\nu)+(1-\lambda)W_{1}(\mu_{2},\nu)$
which gives convexity. Thus, $$W_{1}(\lambda\mu_{1}+(1-\lambda)\mu_{2},\nu)\leq \lambda W_{1}(\mu_{1},\nu)+(1-\lambda)W_{1}(\mu_{2},\nu)\leq \theta.$$
When restrictions are made such as replacing $\mathcal{P}(\Xi)$ with some other set of probability measures, as is done in the paper here, the answer can be no: