Yes this is true, formally this follows by the envelope theorem. In an abstract and very smooth setting, the envelope theorem says that for an objective functional depending on a parameter $t$
$$
F(t)=\max\limits_z f(t,z),
$$
then the derivative of the optimal value can be computed as
$$
\frac{dF}{dt}(t)=\partial_t f(t,z_t)
\qquad
\mbox{for any smooth selection of a maximizer }z_t \mbox{ of }F(t).
$$
This can be seen easily: for any such choice of a maximizer, just apply a chain rule and use the optimality condition of $z_t$ in the maximization problem for fixed $t$:
$$
F'(t)=\frac d{dt}f(t,z_t)=\partial_t f(t,z_t)+\underbrace{\partial_zf(t,z_t)}_{=0}\frac {dz_t}{dt}.
$$
This means, roughly speaking, that one can simply forget that the minimizer varies, only the variation of the functional matter.
In your specific context, you are trying to differentiate (w.r.t $\rho$) the optimal value of the optimization problem given by the Kantorovich dual formulation
$$
W^2(\rho,\eta)
=F_\eta(\rho)
=\max\limits_\phi f_\eta(\rho,\phi)
=\max\limits_\phi \left\{\int \rho\phi+\int\eta\phi^c\right\}
$$
(here $\eta$ is fixed once and for all, I'm mimicking my $F,f$ notations above to give some perspective and I hope the notation is sufficiently self-explanatory).
Although the Kantorovich potential $\phi$ from $\rho$ to $\eta$ (the optimizer) varies when $\rho$ varies, the envelope theorem strongly suggests that you can actually argue as it did not vary at leading order (same for its $c$-transform $\phi^c$), and one can simply differentiate the functional w.r.t. the varying "parameter" $\rho$. Since the Kantorovich functional is linear in $\rho$, the conclusion is indeed that the first variation is given by $ \frac{\partial f_\eta}{\partial_\rho}(\rho,\phi)=\phi$.
Of course various subtle problems may arise owing essentially to the infinite-dimensional setting and functional-analytic details, but this is the rough idea.
For a completely rigorous statement and proof I can recommend Filippo Santambrogio's book [1], in particular chapter 7 and Proposition 7.17
[1] Santambrogio, Filippo. "Optimal transport for applied mathematicians." Birkäuser, NY 55.58-63 (2015): 94.
$\newcommand{\de}{\delta}\renewcommand{\S}{\mathcal S}\newcommand{\T}{\mathcal T}$The answer to your question is negative if $p<2$.
Indeed, let $\nu_i=\de_0$ for all $i$, where $\de_a$ is the Dirac measure supported on the singleton set $\{a\}$.
Let $X_1,\dots,X_n$ be independent random variables (r.v.'s) with respective distributions $\mu_1,\dots,\mu_n$. Let $Y_i:=X_i^2$ for all $i$.
Let
\begin{equation*}
\text{$\mu:=\bigotimes_1^n\mu_i$ and $\nu:=\bigotimes_1^n\nu_i$.}
\end{equation*}
Then $W_p(\mu_i,\nu_i)=(E|X_i|^p)^{1/p}$ and
$W_p(\mu,\nu)=(E(\sum_1^n X_i^2)^{p/2})^{1/p}$.
So, the inequality in question becomes
\begin{equation*}
L:=\Big\|\sum_1^n Y_i\Big\|_{p/2}\overset{\text{(?)}}\le
\sum_1^n \|Y_i\|_{p/2}=:R.
\end{equation*}
Suppose now that $n=2$ and $Y_1,Y_2$ are independent r.v.'s such that $P(Y_i=1)=t=1-P(Y_i=0)$ for $i=1,2$, where $t\downarrow0$.
Then $L=(2t(1-t)+t^2 2^{p/2})^{2/p}\sim(2t)^{2/p}$, whereas $R=2t^{2/p}$, so that the inequality $L\le R$ fails to hold if $p<2$ and $t$ is small enough. $\quad\Box$
I have just gotten up during the night and it occurred to me how to show, very simply, that the inequality in question holds for $p\ge2$, and then I saw the answer by the OP :-), which is correct, except it had to be mentioned there that the pairs $(X_i,Y_i)$ may be assumed to be independent.
Actually, we have the following somewhat more general fact. Let $p\in[2,\infty)$. For each $i\in[n]:=\{1,\dots,n\}$, let $(S_i,\S_i,\mu_i)$ and $(T_i,\T_i,\nu_i)$ be probability spaces. Let $(S,\S,\mu):=\bigotimes_{i\in[n]}(S_i,\S_i,\mu_i)$ and $(T,\T,\nu):=\bigotimes_{i\in[n]}(T_i,\T_i,\nu_i)$. For each $i\in[n]$, let $\Pi(\mu_i,\nu_i)$ denote the set of all probability measures on the measurable space $(S_i\times T_i,\S_i\otimes\T_i)$ with marginals $\mu_i$ and $\nu_i$.
Let $\Pi(\mu,\nu)$ denote the set of all probability measures on the measurable space $(S\times T,\S\otimes\T)$ with marginals $\mu$ and $\nu$. For each $i\in[n]$, let $f_i\colon S_i\times T_i\to[0,\infty)$ be a measurable function. Let $f(s,t):=\sqrt{\sum_{i\in[n]} f_i(s_i,t_i)^2}$ for all $s=(s_1,\dots,s_n)\in S= S_1\times\cdots\times S_n$ and $t=(t_1,\dots,t_n)\in T=T_1\times\cdots\times T_n$. For each $i\in[n]$, let
\begin{equation*}
W_p(\mu_i,\nu_i):=\inf_{\pi_i\in\Pi(\mu_i,\nu_i)}
\Big(\int_{S_i\times T_i}f_i(s_i,t_i)^p\,\pi_i(ds_i,dt_i)\Big)^{1/p}.
\end{equation*}
Let
\begin{equation*}
W_p(\mu,\nu):=\inf_{\pi\in\Pi(\mu,\nu)}
\Big(\int_{S\times T}f(s,t)^p\,\pi(ds,dt)\Big)^{1/p}.
\end{equation*}
Then
\begin{equation*}
W_p(\mu,\nu)\overset{\text{(?)}}\le \Big(\sum_{i\in[n]} W_p(\mu_i,\nu_i)^2\Big)^{1/2}. \tag{1}\label{1}
\end{equation*}
Indeed, take any real $w_1,\dots,w_n$ such that $W_p(\mu_i,\nu_i)<w_i$ (if such $w_1,\dots,w_n$ exist; otherwise, inequality \eqref{1} is trivial). Then for each $i\in[n]$ there exists some $\pi_i\in\Pi(\mu_i,\nu_i)$ such that
\begin{equation*}
\Big(\int_{S_i\times T_i}f_i(s_i,t_i)^p\,\pi_i(ds_i,dt_i)\Big)^{1/p}<w_i. \tag{2}\label{2}
\end{equation*}
Let
$$\pi:=\bigotimes_{i\in[n]}\pi_i.$$
Then $\pi\in\Pi(\mu,\nu)$. Moreover, for each $i\in[n]$
\begin{equation*}
\Big(\int_{S_i\times T_i}f_i(s_i,t_i)^p\,\pi_i(ds_i,dt_i)\Big)^{2/p}
=\|g_i\|_{L^{p/2}(\pi)},
\end{equation*}
where $g_i(s,t):=f_i(s_i,t_i)^2$ for for all $s=(s_1,\dots,s_n)\in S= S_1\times\cdots\times S_n$ and $t=(t_1,\dots,t_n)\in T=T_1\times\cdots\times T_n$, and
\begin{equation*}
\Big(\int_{S\times T}f(s,t)^p\,\pi(ds,dt)\Big)^{2/p}
=\|g\|_{L^{p/2}(\pi)},
\end{equation*}
where $g:=f^2=g_1+\cdots+g_n$.
So, by the condition $p\in[2,\infty)$, Minkowski's inequality, and \eqref{2},
\begin{multline*}
W_p(\mu,\nu)^2\le \Big(\int_{S\times T}f(s,t)^p\,\pi(ds,dt)\Big)^{2/p}
=\|g\|_{L^{p/2}(\pi)} \\
\le\sum_{i\in[n]} \|g_i\|_{L^{p/2}(\pi)}
=\sum_{i\in[n]} \Big(\int_{S_i\times T_i}f_i(s_i,t_i)^p\,\pi_i(ds_i,dt_i)\Big)^{2/p}
<\sum_{i\in[n]}w_i^2.
\end{multline*}
Letting now $w_i\downarrow W_p(\mu_i,\nu_i)$ for each $i\in[n]$, we get \eqref{1}. $\quad\Box$
Best Answer
Referring to the proof of Prop 3.21 in Malrieu 2001, after the triangle inequality is applied twice, two of the terms are bounded via the upper bound $$ W_2(u_t,u_t^{(1,N)}) \vee W_2(\mu_{1,N},\bar{u}) \le \sup_{s \ge 0} \sqrt{E|X_s^{1,N}-\bar{X}_s^1|^2} \tag{$\star$}$$ which accounts for the factor $2$. The reason this bound holds is because of the supremum over $s$, which indicates that it holds for all $s\ge0$ including $s=t$ and $s=\infty$ which gives (via the coupling characterization of the 2 Wasserstein distance) the upper bounds in ($\star$) on $W_2(u_t,u_t^{(1,N)})$ and $W_2(\mu_{1,N},\bar{u}) $, respectively. To finish, Theorem 3.3 is invoked.
For the related $L^1$ bound between the densities, one uses the analogous upper bound $$ \|u_t-u_t^{(1,N)}\|_1 \vee \|\mu_{1,N}-\bar{u}\|_1 \le \sup_{s \ge 0} \|u_s-u_s^{(1,N)}\|_1 \;. $$ To finish, Prop. 3.13 is invoked with $k=1$.