Most of what occurs to me has already been said, but you may find the following picture useful.
If $d_C$ is the Chebyshev metric on $R^2$, i.e. with points $\mathbf{p} = (x_1,y_1)$ and $\mathbf{q} = (x_2,y_2)$ in $R^2$,
$d_C(\mathbf{p,q}) := |x_1-x_2| \vee |y_1-y_2|$,
and $h_C$ is the Hausdorff metric on closed subsets of $R^2$ induced by $d_C$, i.e. with $A$ and $B$ being closed subsets of $R^2$,
$h_C(A,B):= \sup_{\mathbf{p} \in A} d_C(\mathbf{p},B) \vee \sup_{\mathbf{q} \in B} d_C(\mathbf{q},A)$,
where as usual $d_C(\mathbf{p},B) = \inf_{\mathbf{r} \in B} d_C(\mathbf{p,r})$ etc,
then the Levy metric between two distribution functions $F$ and $G$ is simply the Hausdorff distance $d_C$ between the closures of the completed graphs of $F$ and $G$.
Lemma 1: Let $\mu, \mu_1, \mu_2,\ldots \in \mathcal{M}$ and $g \in \mathcal C_b(X)$. If $\mu_i \to \mu$ weakly, then $\mu_i(A) \to \mu(A)$ for all Borel set $A \subseteq X$ with $\mu(\partial A) = 0$.
Lemma 2: If $X$ is separable and $\mu \in \mathcal{M}$. Then for each $\delta>0$ there are countably many open (or closed) balls $B_{1}, B_{2}, \ldots$ such that $\bigcup_{i=1}^{\infty} B_{i}=X$, the radius of $B_{i}$ is less than $\delta$, and $\mu\left(\partial B_{i}\right)=0$ for all $i$.
Fix $\varepsilon>0$. We want to show that $\exists N, \forall i \geq N: d_{P}\left(\mu_{i}, \mu\right) \leq \varepsilon$, i.e., $\mu_{i}(B) \leq \mu\left(B_{\varepsilon}\right)+\varepsilon$ and $\mu(B) \leq \mu_{i}\left(B_{\varepsilon}\right)+\varepsilon$ for all Borel subset $B$.
Fix $\delta \in (0, \varepsilon/4)$. By Lemma 2, there are countably many open balls $B_{1}, B_{2}, \ldots$ with radius less than $\delta/2$ such that $\bigcup_{i=1}^{\infty} B_{i}=X$ and $\mu\left(\partial B_{i}\right)=0$ for all $i$. Fix $k$ such that
$$
\mu\left(\bigcup_{j=1}^{k} B_{j}\right) \ge \mu(X)-\delta.
$$
Let $\mathcal A$ be the finite collection of subsets built by combining the balls $B_1, \ldots, B_k$, i.e.,
$$
\mathcal{A}:=\left\{\bigcup_{j \in I} B_{j} \,\middle\vert\, J \subset \{1, \ldots, k\}\right\}.
$$
We will use this collection to approximate any Borel set. For each $A \in \mathcal{A}, \partial A \subset \partial B_{1} \cup \cdots \cup \partial B_{k}$, so $\mu(\partial A) \leq$ $\mu\left(\partial B_{1}\right)+\cdots+\mu\left(\partial B_{k}\right)=0$. By Lemma 1, $\mu_{i}(A) \rightarrow \mu(A)$ for all $A \in \mathcal{A}$. Fix $N$ such that
$$
\left|\mu_{i}(A)-\mu(A)\right|<\delta \quad \forall i \geq N, \forall A \in \mathcal{A}.
$$
In particular,
$$
\mu_i \left(\bigcup_{j=1}^{k} B_{j}\right) \ge \mu \left(\bigcup_{j=1}^{k} B_{j}\right) -\delta \ge \mu(X) - 2 \delta \quad \forall i \ge N.
$$
Now we fix a Borel set $B$ and approximate it by
$$
A := \bigcup \{B_j \mid j = 1,\ldots,k \text{ such that } B_j \cap B \neq \emptyset\}.
$$
Then
- $A \subset B_{\delta} := \{x \mid d(x, B)<\delta\}$ because $\operatorname{diam} B_{j}<\delta$,
- $B=\left[B \cap \bigcup_{j=1}^{k} B_{j}\right] \cup\left[B \cap\left(\bigcup_{j=1}^{k} B_{j}\right)^{c}\right] \subset \left [ A \cup\left(\bigcup_{j=1}^{k} B_{j}\right)^{c} \right ]$,
- $\left|\mu_{i}(A)-\mu(A)\right|<\delta$ for all $i \geq N$, and
- $\mu\left(\left(\bigcup_{j=1}^{k} B_{j}\right)^{c}\right) \leq \delta$ and $\mu_{i}\left(\left(\bigcup_{j=1}^{k} B_{j}\right)^{c}\right) \leq \mu_i(X)-\mu(X)+ 2 \delta \le \mu_i(X) + 3\delta$ for all $i \geq N$.
It follows that for every $i \geq N$ :
\begin{aligned}
\mu(B) & \leq \mu(A)+\mu\left(\left(\bigcup_{j=1}^{k} B_{j}\right)^{c}\right) \\
& \leq \mu(A)+\delta \\
& \leq \mu_{i}(A)+2 \delta \\
& \leq \mu_{i}\left(B_{\delta}\right)+2 \delta \\
&\leq \mu_{i}\left(B_{\varepsilon}\right)+\varepsilon \\
\mu_{i}(B) & \leq \mu_{i}(A)+\mu_{i}\left(\left(\bigcup_{j=1}^{k} B_{j}\right)^{c}\right) \\
&\leq \mu_{i}(A)+ 3 \delta \\
&\leq \mu(A)+4 \delta \\
& \leq \mu\left(B_{\delta}\right)+4 \delta \\
&\leq \mu\left(B_{\varepsilon}\right)+\varepsilon.
\end{aligned}
This is true for every $B \in \mathcal{B}$, so $d_{P}\left(\mu_{i}, \mu\right) \leq \varepsilon$ for all $i \geq N$.
Best Answer
On metric spaces, the Prokhorov metric can always be bounded by the total variation metric. If $S$ is finite, you can bound the total variation metric by the Prokhorov metric via the Wasserstein metric.
More metrics and their relations (including the ones above) are nicely summariced in Gibbs A.L. and Su F.E., On Choosing and Bounding Probability Metrics, International Statistical Review 70 (2002), pp. 419-435.