The joint probability distribution needed for the proof of Prop. 1 is the pushforward of $\mu$ under the map $x \mapsto (x,f(x))$ from $M$ to $M \times M$.
The heuristic argument given for proposition 2 only applies to $p=1$; for larger $p$ the bound is a bit different.
Since $\int f \, d\mu =1$, we have
$$(*) \quad \forall x \in M \quad |f(x)-1| \le \alpha D \,.$$
The joint probability distribution needed for the proof of Prop. 2
is $\gamma$ defined as follows. Let $g(x)=f(x) \wedge 1$ and
$ \, c=1- \int g(x) \, d\mu(x)$, so by $(*)$,
$$(**) \quad 0 \le c \le \alpha D \,.$$ If $c=0$ then $\mu=\mu_f$, so we may assume that $c>0$. Then define $\gamma$ via integration against bounded continuous functions $h \in C(M \times M)$ by
$$\int h(x,y) \, d\gamma(x,y):= \int h(x,x) g(x) \, d\mu(x) \\
+ c^{-1}\int \int h(x,y) (1 -g(x)) (f(y)-g(y)) \,d\mu(x) \, d\mu(y) \,.
$$
It is easy to verify that $\gamma$ is a coupling of $\mu$ and $\mu_f$.
Moreover,
$$\int d(x,y)^p \, d\gamma(x,y) \\ \le c^{-1}\int \int D^p (1 -g(x)) (f(y)-g(y)) \,d\mu(x) \, d\mu(y) =c D^p \le \alpha D^{p+1} \,,
$$
by $(**)$. Thus
$$W_p(\mu,\mu_f) \le (\alpha D^{p+1})^{1/p} \,.$$
Yes, $a \mapsto \mu_a$ is a measurable map from $\mathcal{A}$ to $\mathcal{P}_\mathcal{B}$.
The topology of weak convergence on $\mathcal{P}_B$ is induced by the maps of the form $\mathcal{P}_B \to \mathbb{R}, \,\nu \mapsto \nu\left(f\right)$, with $f \in C_b\left(\mathcal{B}\right)$ and
$$\nu(f) := \int_\mathcal{B} fd\nu,$$
and the Borel $\sigma$-algebra on $\mathcal{P}_\mathcal{B}$ is also generated by these functions. Thus it suffices to show that the maps $\mathcal{A} \to \mathbb{R}, \,a \mapsto \mu_a\left(f\right)$ are measurable for all $f \in C_b\left(\mathcal{B}\right)$.
Fix an $f \in C_b\left(\mathcal{B}\right)$. We can approximate $f$ by a sequence of simple functions $\left(f_n\right)$ such that $f_n \uparrow f$ uniformly. $a \mapsto \mu_a\left(f_n\right)$ is clearly measurable as the linear combination of measurable function. Then $a \mapsto \mu_a\left(f\right)$ is measurable as the limit of measurable functions,
$$ a \mapsto \mu_a\left(f\right) = \lim_{n\to \infty} \left(a\mapsto \mu_a\left(f_n\right)\right).$$
Also another way to verify that $a\mapsto \mathcal{W}\left(\mu_a, \nu\right)$ is measurable is to use the fact that $\mathcal{P}_{\mathcal{B}\times\mathcal{B}}$ is a Polish metric space
as $\mathcal{B}\times\mathcal{B}$ is.
That plus the fact that $$\pi \mapsto \int_{\mathcal{B}\times\mathcal{B}}d(x, y)\pi\left(dx, dy\right)$$
is continuous on $$\mathcal{P}_{\mathcal{B}\times\mathcal{B}}^1 :=\left\{\pi \in \mathcal{P}_{\mathcal{B}\times\mathcal{B}} : \int_{\mathcal{B}\times\mathcal{B}}\left[d\left(x', x\right) + d\left(y', y\right)\right]\pi\left(dx dy\right) < \infty\right\} $$
where $x', y' \in \mathcal{B}$ are any two points,
the following set is separable
$$\Pi^1\left(\mu_a, \nu\right) := \left\{\pi \in \mathcal{P}_{\mathcal{B}\times\mathcal{B}}^1 : \pi\left(\mathcal{B} \times A\right) = \mu_a\left(A\right) \text{ and } \pi\left(A \times \mathcal{B}\right) = \nu\left(A\right) \; \forall A \in \mathcal{B}\right\} $$
and
$$\inf_{\pi \in \Pi^1\left(\mu_a, \nu\right)}\int_{\mathcal{B}\times\mathcal{B}}d(x, y)\pi\left(dx, dy\right) = \inf_{\pi \in Z}\int_{\mathcal{B}\times\mathcal{B}}d(x, y)\pi\left(dx, dy\right), $$
for a dense countable set $Z \subset \Pi^1\left(\mu_a, \nu\right)$. Thus the infimum does not pose any measurability problems.
Best Answer
The pushforward $(f,g)_\sharp\mu$ of $\mu$ under the map $x\mapsto(f(x),g(x))$ is obviously a coupling between $f_\sharp\mu$ and $g_\sharp\mu$, hence $$ W_2^2(f_\sharp\mu,g_\sharp\nu)\le\int_{\mathbb R^n\times\mathbb R^n}\Vert x-y\Vert^2\,d(f,g)_\sharp\mu(x,y)=\int_{\mathbb R^n}\Vert f(x)-g(x)\Vert^2\,d\mu(x). $$