As @lukanz mentioned, what you are trying to prove is not true. Here is an even simpler counterexample: consider the sequence $\delta_{\frac{1}{n}}$ (Dirac measures centred at $1/n$). It is easy to check that
\begin{align}
\lVert\delta_0 -\delta_{1/n} \rVert_{\mathrm{TV}}=1\, , \quad \forall \, n \geq 1 \, .
\end{align}
On the other hand, choosing the coupling $(X,X+1/n)$ where $X\sim \delta_0$, one has that
\begin{align}
W_1\left(\delta_0,\delta_{\frac1n}\right) \leq \frac1n\, .
\end{align}
So your inequality cannot hold true. The inequality the other way does hold true but with a constant and using a weighted version of the total variation distance. You can find the statement and proof in Chapter 6 of Villani's book Optimal Transport: Old and New.
Consider a graph $G$ on $\{1,\ldots,n\}$ where $\mu_i$ is assigned to node $i$, and an edge $\{i,j\}$ in $G$ indicates that the coupling 0f $\mu_i$ and $\mu_j$ is required to be optimal. If $G$ has no cycles (i.e., it is a forest) then the proof of the gluing lemma (see, e.g., [1]), applied separately to each tree of the forest, extends to show that a global coupling can be arranged to be optimal on all edges of $G$. Conversely, we next verify that if $G$ contains a cycle, then there is a choice of the measures $\mu_i$ where such a global optimal coupling does not exist.
Let us first show this for a cycle of length 3:
Suppose the variable $X$ is uniform in $\{1,2,3\}$, the variable $Y$ is uniform in $\{2,3,4\}$ and $Z$ is uniform in $\{1,3,4\}$.
Under the optimal coupling $P_1$ of $(X,Y)$, we have $P_1(Y=4|X=1)=1$.
Under the optimal coupling $P_2$ of $(Y,Z)$, we have $P_2(Z=4|Y=4)=1$.
Under the optimal coupling $P_3$ of $(X,Z)$, we have $P_3(Z=4|X=1)=0$.
This shows that there is no coupling of $(X,Y,Z)$ that projects to the optimal couplings $P_1, P_2, P_3$.
For a cycle of length $k>3$, simply assign two adjacent nodes of the cycle the laws of $X$ and $Y$, and assign all other nodes the law of $Z$.
[1] Villani, Cedric. Optimal transport: old and new. Vol. 338. Berlin: springer, 2009.
Best Answer
Wasserstein metrics are a family, parametrized by $p\in [1,\infty]$. My answer focusses on the case $p=2$, which is the standard one for geometric purposes. Explicit computations are possible for various examples.
General metric spaces:
for the pair $(\mu,\delta_x)$, any $\mu$, the coupling is unique hence optimal and the obvious one: $\mu\otimes\delta_x$.
for $(\mu,\nu)$ with $\mu=\sum_i^m a_i \delta_{x_i}$ and $\nu=\sum_j^n b_j \delta_{x_j}$ (convex combinations of Dirac's), the set of all couplings $\pi$ between $\mu$ and $\nu$ is bijective to the set $B^{n,m}$ of $m\times n$ bistochastic matrices. For small $m,n$ it is a good exercise to compute the integral $\int d^2(x,y)d\pi$ and try to minimize over $B^{n,m}$.
Euclidean spaces:
Gaussian OT for $\mu_1,\mu_2$ Gaussian measures on $\mathbb R^n$, it was shown in [A] that the optimal coupling is itself a Gaussian measure on $\mathbb R^{2n}$. Furthermore, the distance is given by $$W_2(\mu_1,\mu_2)=\sqrt{|m_1-m_2|^2 + \mathrm{tr}(M_1)+\mathrm{tr}(M_2)-2\mathrm{tr}\left[\big(\sqrt{M_1}M_2\sqrt{M_1}\big)^{1/2}\right]}$$ where $\mu_i$ has expectation $m_i$ and (positive definite) covariance matrix $M_i$.
semi-discrete OT Let $K$ be a convex subset of $\mathbb R^n$ with Lebesgue measure $\lambda_n K=1$. For $(\lambda_n,\mu)$ with $\mu=\sum_{i=1}^Na_i\delta_{x_i}$, $N\in \mathbb N\cup \{+\infty\}$ purely atomic, there exist $N$ pairwise disjoint convex subsets $U_i$ of $K$ with $\lambda_n U_i=a_i$ and such that $\pi=\sum_{i=1}^N \mathbb 1_{U_i}\lambda_n \otimes\delta_{x_i}$. If $N$ is finite, $\{U_i\}_i$ is a Laguerre tessellation of $K$, see e.g. [B]. If $a_i=1/N$, then this tessellation is the Voronoi tessellation with centers $x_i$.
Beside these explicit examples there are very many others, treating more sophisticated distributions, usually on the real line. Furthermore, there exist a huge literature about characterizations of optimal transport maps, say $T$, by all sorts of methods (mostly, descriptions of $T$ as solution to PDEs). If an optimal transport map exists, then it is unique, and the unique optimal coupling satisfies $\pi=(\mu, T_*\mu)$.
[A] C.R. Givens and R.M. Shortt A class of Wasserstein metrics for probability distributions Michigan Math. J., 1984
[B] C. Lautensack and S. Zuyev. Random Laguerre tessellations. Adv. in Appl. Probab., 40(3):630–650, 2008.