Such a bound can be derived with Girsanov's theorem and Pinsker's inequality. Let $X_t = x_0 + W_t$. Supposing $b$ has linear growth in $x$, we may define measures $P_i$, $i=1,2$ by
$\frac{dP_i}{dP} = \exp\left(\int_0^Tb_i(t,X_t)dW_t - \frac{1}{2}\int_0^T|b_i(t,X_t)|^2dt\right)$.
Then, under $P_i$, $W^i_t := W_t - \int_0^tb_i(s,X_s)ds$ is a Brownian motion, and so $X$ is a weak solution of SDE (i). Let $P^i \circ X^{-1}$ denote the $P^i$-law of the entire process, a measure on the space of continuous functions. Then
$\frac{dP_2}{dP_1} = \exp\left(\int_0^T(b_2(t,X_t) - b_1(t,X_t))dW^1_t - \frac{1}{2}\int_0^T|b_2(t,X_t) - b_1(t,X_t)|^2dt\right)$.
Let $\mathcal{H}(\cdot | \cdot)$ denote the relative entropy. By Pinsker's inequality,
$
\begin{align}
d_{TV}^2(P_1 \circ X^{-1}, P_2 \circ X^{-1}) &= d_{TV}^2(P_1, P_2) \le 2\mathcal{H}(P_1 | P_2) \newline
&= -2\mathbb{E}^{P_1}\left[\log \frac{dP^2}{dP^1}\right] \newline
&= \mathbb{E}^{P_1}\left[\int_0^T|b_1(s,X_s) - b_2(s,X_s)|^2ds\right] \newline
&= \mathbb{E}^{P}\left[\frac{dP_1}{dP}\int_0^T|b_1(s,X_s) - b_2(s,X_s)|^2ds\right]
\end{align}
$
since the stochastic integral term is a true martingale. This is actually a much stronger control than you requested, since it is easy to see that $d_{TV}(P_1 \circ X_t^{-1}, P_2 \circ X_t^{-1}) \le d_{TV}(P_1 \circ X^{-1}, P_2 \circ X^{-1})$ for any $t$. If you have a uniform bound on $|b_1 - b_2|$, you have a good bound on the $TV$ distance between your processes. You can probably get away with an $L^2$ bound, but you'll then need to fuss with the $dP_1/dP$ term a bit. Hope this helps!
EDIT 1: It's interesting to note that this approach breaks down if you have two different volatility coefficients in your SDE, because the laws of the processes are then singular. But you could still bound the TV distance between the time-$t$ laws, as requested, probably via Malliavin calculus.
EDIT 2: I should elaborate on the first of the string of equalities above. Since $X$ and $W$ generate the same $\sigma$-fields, $dP^i/dP$ is $X$-measurable, and so
$\frac{dP^i \circ X^{-1}}{dP \circ X^{-1}} (X) = \mathbb{E}\left[\frac{dP^i}{dP}|X\right] = \frac{dP^i}{dP}$.
From this it is clear that the TV-distances above are the same, from the formula $d_{TV}(\mu,\nu) = \int d\lambda|d\mu/d\lambda - d\nu/d\lambda|$ for $\mu,\nu \ll \lambda$.
I have something that may or may not be useful...
Diaconis notes an interpretation of variation distance of Paul Switzer. Consider $\mu$, $\nu\in M_p(S)$. Given a single observation of $S$, sampled from $\mu$ or $\nu$ with probability $1/2$, guess whether the observation, $o$, was sampled from $\mu$ or $\nu$. The classical strategy presented here gives the probability of being correct as $1/2(1+\|\mu-\nu\|)$:
- Evaluate $\mu(o)$ and $\nu(o)$.
- If $\mu(o)\geq\nu(o)$, choose $\mu$.
- If $\nu(o)>\mu(o)$, choose $\nu$.
To see this is true, let $\{\mu>\nu\}$ be the set $\{t\in S:\mu(t)>\nu(t)\}$.
Suppose $o$ is sampled from $\mu$. Then the strategy is correct if $o\in\{\mu=\nu\}$ or $o\in\{\mu>\nu\}$:
$$\mathbb{P}[\text{guessing correctly}\,|\,\mu]=\mathbb{P}[o\in\{\mu=\nu\}\,|\,\mu]+\mathbb{P}[o\in\{\mu>\nu\}\,|\,\mu]$$
with a similar expression for $\mathbb{P}[\text{guessing correctly}\,|\,\nu]$.
Note that $\mathbb{P}[o\in\{\mu=\nu\}]=\mu(\{\mu=\nu\})=\nu(\{\mu=\nu\})$ and also $\mathbb{P}[o\in\{\mu>\nu\}\,|\,\mu]=\mu(\{\mu>\nu\})$ (and similar for $o\in\{\mu<\nu\}$). Thus
\begin{align*}
\mathbb{P}[\text{guessing correctly}] &=\frac12\mathbb{P}[\text{guessing correctly}\,|\,\mu]+\frac12\mathbb{P}[\text{guessing correctly}\,|\,\nu]
\\&=\frac12\left(\nu(\{\mu=\nu\})+\mu(\{\mu>\nu\})\right)+\frac12\left(\nu(\{\mu<\nu\})\right)
\end{align*}
It is easily shown that
$$\|\mu-\nu\|=\mu\left(\{\mu>\nu\}\right)-\nu\left(\{\mu>\nu\}\right).$$
Hence
$$
\mathbb{P}[\text{guessing correctly}]=\frac12\left(\underbrace{\nu(\{\mu=\nu\})+\nu(\{\mu>\nu\})+\nu(\{\mu<\nu\})}_{=1}+\|\mu-\nu\|)\right).$$
Best Answer
Since your state space is finite, you will have that $\|p_n-p\|\to 0$ and $\|p_n'-p'\|\to 0$ at exponential rate of decay of probability (simply from finite alphabet large deviations - for example, use section 2.1 in Dembo-Zeitouni's large deviations book). That is, $P(\|p_n-p\|>\delta)\leq n^{|S|} e^{-n I(\delta)}$ where $I$ also depends on the size of the sample space. This immediately gives you (1) with $\epsilon$ decaying exponentially in $n$ (though not uniformly in $|S|$).