I think I have an answer for the case p = 1, K = 2. I write "I think" because my computation does not coincide with the example values for $N=4$ posted earlier by OP in a comment, but I really cannot find any error in my proof, so I wanted to share it.
As already mentioned, we only need to consider permutation measures of the form $\mu = \sum_{i=1}^N \frac{1}{N} \delta_{i,\sigma(i)}$, since these are the extremal points of the set we optimize over.
For any such $\mu$, we can define a coupling $\pi$ by $\pi_{(i,\sigma(i)),(i,j)} = 1/N^2$ for $i,j \in \{1,...,N\}$ and $\pi_{(i,j),(i',j')} = 0$, if $i,j,i',j'$ are not of the form as before. That this is an admissible coupling in the definition of $W_1$ is trivial.
The corresponding "value" is
$$
\sum_{1\leq i,j,i',j' \leq N} ||(i,j)-(i',j')||_1 \pi_{(i,j),(i',j')} = \sum_{1 \leq i,j' \leq N} |\sigma(i) - j'| \frac{1}{N^2}.
$$
We further show that this value is minimal if $\mu$ is the monotonic or comonotonic measure, which will yield the claim. I only show this for the monotonic case, $\mu = \frac{1}{N}\sum_{i=1}^N \delta_{i,i}$. Then for any admissible coupling $\hat{\pi}$, by the constraints in the calculation of the Wasserstein distances, we see
\begin{align*}
\sum_{1\leq i,j,i',j' \leq N} ||(i,j)-(i',j')||_1 \hat{\pi}_{(i,j),(i',j')} =& \sum_{1\leq i',j' \leq N} \sum_{1 \leq i \leq N} ||(i,i)-(i',j')||_1 \hat{\pi}_{(i,i),(i',j')} \\ \geq& \sum_{1\leq i',j' \leq N} \sum_{1 \leq i \leq N} ||(i',i')-(i',j')||_1 \hat{\pi}_{(i,i),(i',j')}\\ =&\sum_{1\leq i',j' \leq N} |i'-j'| \frac{1}{N^2},
\end{align*}
where the inequality is elementwise and the last term corresponds to the value for the coupling $\pi$.
Note that this only shows that monotonic/comonotonic measures are maximizers, but not that these are the only maximizers.
Edit: (This question may be better suited for math.stackexchange.)
This is true. It suffices to show the map $\mu\rightarrow W_{1}(\mu,\nu)$ is convex. Let $\mu_{i}\in\mathcal{P}(\Xi)$ and $\psi^{*}_i\in\Gamma(\mu_{i},\nu)$ be optimal transport plans between $\mu_{i}$ and $\nu$ for each $i=1,2$. Then $\lambda\psi_{1}^{*}+(1-\lambda)\psi_{2}^{*}\in\Gamma(\lambda\mu_{1}+(1-\lambda)\mu_{2},\nu)$ and so
$W_{1}(\lambda\mu_{1}+(1-\lambda)\mu_{2},\nu)=\inf\limits_{\gamma\in\Gamma(\lambda\mu_{1}+(1-\lambda)\mu_{2},\nu)} \int d(x,y) d\gamma(x,y)\leq \\ \int d(x,y) d(\lambda\psi_{1}^{*}+(1-\lambda)\psi_{2}^{*})(x,y) = \lambda W_{1}(\mu_{1},\nu)+(1-\lambda)W_{1}(\mu_{2},\nu)$
which gives convexity. Thus, $$W_{1}(\lambda\mu_{1}+(1-\lambda)\mu_{2},\nu)\leq \lambda W_{1}(\mu_{1},\nu)+(1-\lambda)W_{1}(\mu_{2},\nu)\leq \theta.$$
When restrictions are made such as replacing $\mathcal{P}(\Xi)$ with some other set of probability measures, as is done in the paper here, the answer can be no:
[For $\mathcal{P}(\Xi)$ replaced with $\mathcal{N}(\Xi)$, Gaussian distributions]...the Wasserstein ambiguity set $P$ is nonconvex (as mixtures of normal distributions are
generically not normal).
Best Answer
It is true and clear if the metric space $X$ has a finite diameter, but false in general: Take $\beta_i=2^{-i}$ and $\mu_i$ the point mass at $3^i$.
Details: In the case $D=$diam$(X)<\infty$, write $s_k=\sum_{i \le k} \beta_i$ and $s=\sum_{i<\infty} \beta_i$. Then $$\nu_\infty=\frac{s_k}{s} \nu_k+\frac{s-s_k}{s} \gamma_k$$ for some probability measure $$\gamma_k=\frac{\sum_{i > k} \beta_i \mu_i}{s-s_k} \,.$$ Therefore $$\nu-\nu_k=\frac{s-s_k}{s} (\gamma_k-\nu_k)$$ whence $$\mathcal{W}(\nu_k,\nu_{\infty}) \le \frac{s-s_k}{s} D$$ because the diameter of the Wasserstein metric on $\mathcal{P}(X)$ is $D$.