If $X$ is not separable then the topology of weak convergence on $P(X)$ may not be metrizable at all.
I don't know of a simple ZFC example, but for instance, suppose $X$ is a measurable cardinal with $d$ the discrete metric. Let $\mu$ be the corresponding 0-1 valued probability measure, and let $C$ be the set of "atomic" probability measures, i.e. those with countable support. It's not hard to show that $C$ is weakly dense in $P(X)$ (this is true for any metric space), but in this space $C$ is also weakly sequentially closed. Yet $\mu \notin C$.
The joint probability distribution needed for the proof of Prop. 1 is the pushforward of $\mu$ under the map $x \mapsto (x,f(x))$ from $M$ to $M \times M$.
The heuristic argument given for proposition 2 only applies to $p=1$; for larger $p$ the bound is a bit different.
Since $\int f \, d\mu =1$, we have
$$(*) \quad \forall x \in M \quad |f(x)-1| \le \alpha D \,.$$
The joint probability distribution needed for the proof of Prop. 2
is $\gamma$ defined as follows. Let $g(x)=f(x) \wedge 1$ and
$ \, c=1- \int g(x) \, d\mu(x)$, so by $(*)$,
$$(**) \quad 0 \le c \le \alpha D \,.$$ If $c=0$ then $\mu=\mu_f$, so we may assume that $c>0$. Then define $\gamma$ via integration against bounded continuous functions $h \in C(M \times M)$ by
$$\int h(x,y) \, d\gamma(x,y):= \int h(x,x) g(x) \, d\mu(x) \\
+ c^{-1}\int \int h(x,y) (1 -g(x)) (f(y)-g(y)) \,d\mu(x) \, d\mu(y) \,.
$$
It is easy to verify that $\gamma$ is a coupling of $\mu$ and $\mu_f$.
Moreover,
$$\int d(x,y)^p \, d\gamma(x,y) \\ \le c^{-1}\int \int D^p (1 -g(x)) (f(y)-g(y)) \,d\mu(x) \, d\mu(y) =c D^p \le \alpha D^{p+1} \,,
$$
by $(**)$. Thus
$$W_p(\mu,\mu_f) \le (\alpha D^{p+1})^{1/p} \,.$$
Best Answer
Let $p$ and $q$ be two arbritrary probability measures on $[0,1]$. By definition,
$$d_W(p,q) = \inf_{\gamma \in \Gamma_{p,q}} \int |x-y| \, d\gamma(x,y)$$
where $\Gamma_{p,q}$ is the set of Borel measures on $[0,1] \times [0,1]$ with marginals $p$ and $q$. As $p$ and $q$ are probability measures, we have for any $\gamma \in \Gamma_{p,q}$
$$\int_{[0,1]} x \, dp(x) = \int_{[0,1] \times [0,1]} x \, d\gamma(x,y) \quad \text{and} \quad \int_{[0,1]} y \, dq(y) = \int_{[0,1] \times [0,1]} y \, d\gamma(x,y).$$
Hence,
$$\begin{align*} |E_p-E_q| &= \left| \int_{[0,1] \times [0,1]} (x-y) \, d\gamma(x,y) \right|\leq \int_{[0,1] \times [0,1]} |x-y| \, d\gamma(x,y). \end{align*}$$
Since this inequality holds for any $\gamma \in \Gamma_{p,q}$ we can take the infimum over all $\gamma \in \Gamma_{p,q}$ to conclude
$$|E_p-E_q| \leq d_W(p,q).$$