We assume that $U$ is centered, square integrable and we denote by $\sigma^2>0$ its variance. Given $a\geq 0$, I denote by $\tau_a$ the hitting time of $[a,+\infty)$ by $X$ starting from $X_0=0$ (this formulation is of course equivalent but more natural when using the following method). Fix $\varepsilon>0$ and let us prove that there exists $C,C'$ such that, for all $a$ large enough,
\begin{align*}
\mathbb{P}(C a^2< \tau_a \leq C' a^2+1)\geq 1-\varepsilon.
\end{align*}
We will use Donsker invariance principle, although there is no need to carefully check the law of hitting times for the Brownian motion, nore to quantify the speed of convergence to the Brownian motion.
Let $B$ be a standard one dimensional Brownian motion and choose $C>0$ and $C'>C$ such that
\begin{align*}
\mathbb{P}(C < T_{1/\sigma}\leq C')\geq 1-\varepsilon/3,
\end{align*}
where $T_{1/\sigma}$ is the hitting time of $1/\sigma$ by the Brownian motion $B$.
Let $(\varphi_k)_{k\in\mathbb{N}}$ (resp. $(\psi_k)_{k\in\mathbb{N}}$) be an increasing (resp. bounded decreasing) sequence of continuous functions converging pointwisely to $\mathbf{1}_{\cdot < 1/\sigma}$. We define the continous function $f_k$ and $g_k$ on $C([0,\infty[)$ (with the topology defined p. 60 of Karatzas-Shreve) by
\begin{align*}
f_k(\omega)=\varphi_k(\max_{t\in[0,C]} \omega_t)
\text{ and }g_k(\omega)=\psi_k(\max_{t\in[0,C']} \omega_t).
\end{align*}
We thus have, almost surely,
\begin{align*}
\mathbf{1}_{C < T_{1/\sigma}\leq C'}=\lim_{k\rightarrow\infty} f_k(B)-g_k(B).
\end{align*}
Hence, by the dominated convergence theorem, we can choose $k_0$ such that
\begin{align*}
\mathbb{E}(f_{k_0}(B)-g_{k_0}(B))\geq 1-2\varepsilon/3.
\end{align*}
For any $n\in\mathbb{N}$, let us define the affine process starting from $0$ and such that
\begin{align*}
X_t^{(n)}=\frac{1}{\sigma \sqrt{n}}Y_{nt},\text{ with }
Y_t=\sum_{n=1}^{\lfloor t\rfloor}U_n+(t-\lfloor t\rfloor)U_{\lfloor t\rfloor+1}.
\end{align*}
Denoting by $T^{(n)}_{1/\sigma}$ the first hitting time of $1/\sigma$ by $X^{(n)}$, it is clear that $a^2 T^{(a^2)}_{1/\sigma}\leq \tau_a < a^2 T^{(a^2)}_{1/\sigma}+1$. Hence
\begin{align*}
\mathbb{P}(C a^2< \tau_a \leq C' a^2+1)&\geq \mathbb{P}(C a^2< a^2 T^{(a^2)}_{1/\sigma}\leq C' a^2)\\
&= \mathbb{P}(C < T^{(a^2)}_{1/\sigma}\leq C')\\
&\geq \mathbb{E}(f_{k_0}(X^{a^2})-g_{k_0}(X^{a^2}))
\end{align*}
We know that the law of $(X_t^{(n)})_{t\geq 0}$ converges weakly to the Brownian motion on $C([0,\infty))$ when $n\rightarrow\infty$ (see for instance Theorem~4.20 p.71 in Karatzas-Shreve), hence
\begin{align*}
\mathbb{E}(f_{k_0}(X^{a^2})-g_{k_0}(X^{a^2}))\xrightarrow[a\rightarrow\infty]{} \mathbb{E}(f_{k_0}(B)-g_{k_0}(B))\geq 1-2\varepsilon/3.
\end{align*}
As a consequence, there exists $a_0$ such that, for all $a\geq a_0$,
\begin{align*}
\mathbb{P}(C a^2< \tau_a \leq C' a^2+1)\geq 1-\varepsilon.
\end{align*}
$\newcommand{\Z}{\mathbb Z}\newcommand{\PP}{\mathcal D}\newcommand{\R}{\mathbb R}$Your function $d$ is not a metric, for two reasons: (i) there may be many processes $(X_t)_{t\in\Z}$ with the same distribution $P$ and (ii) your function $d$ does not take into account the values of $X_t$ for negative $t\in\Z$. So, your $d$ is, not a metric, but a pseudometric, which does not allow one to identify limits uniquely.
We can fix these deficiencies as follows: Let $\PP$ denote the set of the distributions of the processes in $\mathcal P$.
Given $P$ and $Q$ in $\PP$, for any natural $m$ let
\begin{equation}
P_m:=P\circ\pi_{-m,\dots,m}^{-1},\quad Q_m:=Q\circ\pi_{-m,\dots,m}^{-1},
\end{equation}
where $\pi_{r,\dots,s}((x_t)_{t\in\Z}):=(x_r,\dots,x_s)$ for any given integers $r,s$ such that $r\le s$.
Let
\begin{equation}
d(P,Q):=\sum_{m=1}^\infty d^{(m)}(P_m,Q_m)2^{-m},
\end{equation}
where $d^{(m)}$ is the Wasserstein metric of order $2$.
We want then to show that $\PP$ is closed with respect to the metric $d$.
Suppose now that we have a sequence $(P^{(n)})$ in $\PP$ such that $d(P^{(n)},Q)\to0$ (as $n\to\infty$) for some probability measure $Q$ (on the cylindrical $\sigma$-algebra) over $\R^\Z$. Then for each natural $m$ we have $d^{(m)}(P^{(n)}_m,Q_m)\to0$. So, by the well-known characterization of the convergence in the Wasserstein metric, $P^{(n)}_m\to Q_m$ weakly,
$\int_{\R^{\Z_m}} x_t^2\,Q_m(dx)=\lim_n\int_{\R^{\Z_m}} x_t^2\,P^{(n)}_m(dx)<\infty$, and
$\int_{\R^{\Z_m}} x_t\,Q_m(dx)=\lim_n\int_{\R^{\Z_m}} x_t\,P^{(n)}_m(dx)=\lim_n0=0$ for $t\in\Z_m:=\{-m,\dots,m\}$.
So, $\int_{\R^\Z} x_t\,Q(dx)=0$ and $\int_{\R^\Z} x_t^2\,Q(dx)<\infty$ for all $t\in\Z$, and
$P^{(n)}_{r,s}\to Q_{r,s}$ weakly for any given integers $r,s$ such that $r\le s$, where $P^{(n)}_{r,s}:=P^{(n)}\circ\pi_{r,\dots,s}^{-1}$ and $Q_{r,s}:=Q\circ\pi_{r,\dots,s}^{-1}$.
By the stationarity, $P^{(n)}_{r+1,s+1}=P^{(n)}_{r,s}$ for all suitable $r,s,n$. Letting now $n\to\infty$, we conclude that $Q_{r+1,s+1}=Q_{r,s}$, so that $Q$ is the distribution of a stationary process. Also, as we saw, $\int_{\R^\Z} x_t\,Q(dx)=0$ and $\int_{\R^\Z} x_t^2\,Q(dx)<\infty$ for all $t\in\Z$. So, $Q\in\PP$.
We conclude that $\PP$ is closed, as desired.
Best Answer
Assume, naturally, that for each $n$ we have $y_0^n\to y_0$ (as $n\to\infty$) in distribution and $y_0^n$ is independent of $(u^n_t)$.
Then for each $T=0,1,\dots$ we have $Y^n_T\to Y_T$ in distribution, where $Y^n_T:=(y^n_0,\dots,y^n_T)$ and $Y^n_T:=(y_0,\dots,y_T)$. This follows because (say) for all $t=0,1,\dots$ $$y^n_t=a_n^t y^n_0+\sum_{k=1}^t a_n^{t-1-k}u_k$$ (which latter can be proved by induction on $t$), so that for all $T=0,1,\dots$ $$Y^n_T=(y^n_0,\dots,y^n_T)\to\Big(y_0,y_0+u_1,\dots,y_0+\sum_{k=1}^T u_k\Big) =(y_0,\dots,y_T)=Y_T$$ in distribution.
The latter convergence is equivalent to the convergence of $d(P_{Y^n},P_Y)$ to $0$, where $$d(P_{Y^n},P_Y):=\sum_{T=0}^\infty c_T\, \frac{d_{LP}(P_{Y^n_T},P_{Y_T})}{1+d_{LP}(P_{Y^n_T},P_{Y_T})},$$ where (i) $(c_T)_{T=0}^\infty$ is any summable sequence of positive numbers and (ii) $P_{Y^n_T}$ and $P_{Y_T}$ are, respectively, the distributions of $Y^n_T$ and $Y_T$ in $\mathbb R^{T+1}$ and $d_{LP}$ is the Lévy–Prokhorov distance between such distributions.
Any metrics on the sets of all probability distributions over $\mathbb R^T$ can be used here in place of $d_{LP}$. If $y_0$ and the $y_0^n$'s are Gaussian, an especially convenient metric seems to be the Wasserstein $W_2$ metric, for which there is a rather simple explicit expression in the Gaussian case.