For $p=1$ one can bound the 1-Wasserstein metric by
$$|m-n| + \sqrt{\sum_{i=1}^{d} \left[
\left(
\sqrt{\lambda_i} - \sqrt{\gamma_i}\right)^2 + 2\sqrt{\lambda_i\gamma_i}(1-v_i\cdot u_i)
\right]}$$
when $\lambda_i$ and $\gamma_i$ are the $i^{th}$ eigen values of $U$ and $V$ respectively, $v_1,\ldots,v_d$ and $u_1,\ldots,u_d$ are the corresponding orthonormal basis of eigen-vectors.
See Chafai & Malrieu Lemma 2.4.
Although this bound seems close-in nature to the $p=2$ bound, I'm not sure if it is sharp.
Yes this is true, formally this follows by the envelope theorem. In an abstract and very smooth setting, the envelope theorem says that for an objective functional depending on a parameter $t$
$$
F(t)=\max\limits_z f(t,z),
$$
then the derivative of the optimal value can be computed as
$$
\frac{dF}{dt}(t)=\partial_t f(t,z_t)
\qquad
\mbox{for any smooth selection of a maximizer }z_t \mbox{ of }F(t).
$$
This can be seen easily: for any such choice of a maximizer, just apply a chain rule and use the optimality condition of $z_t$ in the maximization problem for fixed $t$:
$$
F'(t)=\frac d{dt}f(t,z_t)=\partial_t f(t,z_t)+\underbrace{\partial_zf(t,z_t)}_{=0}\frac {dz_t}{dt}.
$$
This means, roughly speaking, that one can simply forget that the minimizer varies, only the variation of the functional matter.
In your specific context, you are trying to differentiate (w.r.t $\rho$) the optimal value of the optimization problem given by the Kantorovich dual formulation
$$
W^2(\rho,\eta)
=F_\eta(\rho)
=\max\limits_\phi f_\eta(\rho,\phi)
=\max\limits_\phi \left\{\int \rho\phi+\int\eta\phi^c\right\}
$$
(here $\eta$ is fixed once and for all, I'm mimicking my $F,f$ notations above to give some perspective and I hope the notation is sufficiently self-explanatory).
Although the Kantorovich potential $\phi$ from $\rho$ to $\eta$ (the optimizer) varies when $\rho$ varies, the envelope theorem strongly suggests that you can actually argue as it did not vary at leading order (same for its $c$-transform $\phi^c$), and one can simply differentiate the functional w.r.t. the varying "parameter" $\rho$. Since the Kantorovich functional is linear in $\rho$, the conclusion is indeed that the first variation is given by $ \frac{\partial f_\eta}{\partial_\rho}(\rho,\phi)=\phi$.
Of course various subtle problems may arise owing essentially to the infinite-dimensional setting and functional-analytic details, but this is the rough idea.
For a completely rigorous statement and proof I can recommend Filippo Santambrogio's book [1], in particular chapter 7 and Proposition 7.17
[1] Santambrogio, Filippo. "Optimal transport for applied mathematicians." Birkäuser, NY 55.58-63 (2015): 94.
Best Answer
$\newcommand{\si}{\sigma}\newcommand{\W}{\mathcal W}\newcommand{\R}{\mathbb R} $Yes, the map \begin{equation*} \mu \mapsto J(\mu):=\int_\R T^\si_\mu(x+y)\exp(-y^2/2\si^2)\,dy \tag{10}\label{10} \end{equation*} is continuous on the 2-Wasserstein space, say $\W_2$.
Indeed, let $\mu_n\to\mu$ in $\W_2$ (as $n\to\infty$). Suppose that $X_n\sim\mu_n$ and $X\sim\mu$. Then $X_n\to X$ in distribution and $EX_n^2\to EX^2$. Letting $Z$ be a standard normal random variable independent of the $X_n$'s and $X$, we get $X_n+\si Z\to X+\si Z$ in distribution, so that \begin{equation*} F_{(\mu_n)_\si}\to F_{\mu_\si} \tag{20}\label{20} \end{equation*} pointwise. Also, the function $F_{\nu_\si}$ is continuous and strictly increasing, so that the inverse function $F^{-1}_{\nu_\si}$ is continuous (and strictly increasing). So, by \eqref{20}, \begin{equation*} T_{\mu_n}^\si\to T_\mu^\si \tag{30}\label{30} \end{equation*} pointwise.
By Markov's inequality, $F_{\nu_\si}(z)\le C/z^2$ for real $z<0$, where $C:=\int_\R x^2\,\nu(dx)<\infty$, and hence \begin{equation*} |F^{-1}_{\nu_\si}(u)|\le\sqrt{C/u} \tag{40}\label{40} \end{equation*} if $0<u<F_{\nu_\si}(0)$ (and $F_{\nu_\si}(0)\in(0,1)$). On the other hand, letting $m_n$ and $m$ denote the medians of $(\mu_n)_\si$ and $\mu_\si$, respectively, and letting $\Phi_\si$ denote the cdf of $N(0,\si^2)$, we see that \eqref{20} implies $m_n\to m$ and hence \begin{equation*} F_{(\mu_n)_\si}(z)=\int_\R F_{\mu_n}(z-y)d\Phi_\si(y) \\ \ge\int_{-\infty}^{z-m_n}F_{\mu_n}(m_n)d\Phi_\si(y) =\frac12\,\Phi_\si(z-m_n)=\exp\left(-\frac{z^2}{2\si^2+o(1)}\right) \end{equation*} uniformly in $n$ as $z\to-\infty$. So, by \eqref{40}, \begin{equation*} |T_{\mu_n}^\si(z)|=|F^{-1}_{\nu_\si}(F_{(\mu_n)_\si}(z))| \le \exp\frac{z^2}{4\si^2+o(1)} \tag{50}\label{50} \end{equation*} uniformly in $n$ as $z\to-\infty$. Similarly, \eqref{50} holds in the right-tail zone, that is, uniformly in $n$ as $z\to\infty$. Also, in view of \eqref{20} and because $F_{\mu_\si}$ is strictly increasing, we see that \begin{equation*} |T_{\mu_n}^\si(z)|=O(1) \end{equation*} uniformly in $n$ for $|z|=O(1)$. So, in view of \eqref{50}, \begin{equation*} |T_{\mu_n}^\si(z)|=O\Big(\exp\frac{z^2}{3\si^2}\Big) \end{equation*} uniformly in $n$ and real $z$. So, for each fixed real $x$, \begin{equation*} |T^\si_{\mu_n}(x+y)|\exp(-y^2/2\si^2) =O\Big(\exp-\frac{y^2}{7\si^2}\Big) \end{equation*} uniformly in $n$ and real $y$.
Thus, by \eqref{10}, \eqref{30}, and dominated convergence, $J(\mu_n)\to J(\mu)$. $\quad\Box$