Lemma: Let $f:X \to Y$ and $C$ be some collection of subsets of $Y$. Then $$f^{-1}[\sigma(C)]=\sigma(f^{-1}[C]).$$
Let $\mathcal O$ be the standard topology of $\mathbb R$. Then $\sigma(\mathcal O)$ is the Borel $\sigma$-algebra of $\mathbb R$. Let $\mathcal C(E)$ the space of all all continuous functionals on $E$. Let
$$
f^{-1}[\sigma(\mathcal O)] := \{f^{-1}(B) \mid B \in \sigma(\mathcal O)\} \quad \forall f \in \mathcal C(E).
$$
Then $\mathcal A$ is the $\sigma$-algebra generated by
\begin{align}
\bigcup_{f \in \mathcal C(E)} f^{-1}[\sigma(\mathcal O)].
\end{align}
We have
\begin{align}
\mathcal A &= \sigma \left ( \bigcup_{f \in \mathcal C(E)} f^{-1}[\sigma( \mathcal O )] \right ) \\
&= \sigma \left ( \bigcup_{f \in \mathcal C(E)} \sigma( f^{-1}[\mathcal O] ) \right ) \quad \text{by our Lemma} \\
& \subseteq \sigma \left ( \sigma \left ( \bigcup_{f \in \mathcal C(E)} f^{-1}[\mathcal O] \right) \right ) \\
&= \sigma \left ( \bigcup_{f \in \mathcal C(E)} f^{-1}[\mathcal O] \right) \\
&\subseteq \mathcal B(E).
\end{align}
Because metric space is perfectly normal. For each closed subset $F$ of $E$, there is a continuous functional $f:E \to [0, 1]$ such that $F =f^{-1}(0)$. This implies $\mathcal A$ contains all closed and thus all open subsets of $E$. This completes the proof.
Lemma 1: Let $\mu, \mu_1, \mu_2,\ldots \in \mathcal{M}$ and $g \in \mathcal C_b(X)$. If $\mu_i \to \mu$ weakly, then $\mu_i(A) \to \mu(A)$ for all Borel set $A \subseteq X$ with $\mu(\partial A) = 0$.
Lemma 2: If $X$ is separable and $\mu \in \mathcal{M}$. Then for each $\delta>0$ there are countably many open (or closed) balls $B_{1}, B_{2}, \ldots$ such that $\bigcup_{i=1}^{\infty} B_{i}=X$, the radius of $B_{i}$ is less than $\delta$, and $\mu\left(\partial B_{i}\right)=0$ for all $i$.
Fix $\varepsilon>0$. We want to show that $\exists N, \forall i \geq N: d_{P}\left(\mu_{i}, \mu\right) \leq \varepsilon$, i.e., $\mu_{i}(B) \leq \mu\left(B_{\varepsilon}\right)+\varepsilon$ and $\mu(B) \leq \mu_{i}\left(B_{\varepsilon}\right)+\varepsilon$ for all Borel subset $B$.
Fix $\delta \in (0, \varepsilon/4)$. By Lemma 2, there are countably many open balls $B_{1}, B_{2}, \ldots$ with radius less than $\delta/2$ such that $\bigcup_{i=1}^{\infty} B_{i}=X$ and $\mu\left(\partial B_{i}\right)=0$ for all $i$. Fix $k$ such that
$$
\mu\left(\bigcup_{j=1}^{k} B_{j}\right) \ge \mu(X)-\delta.
$$
Let $\mathcal A$ be the finite collection of subsets built by combining the balls $B_1, \ldots, B_k$, i.e.,
$$
\mathcal{A}:=\left\{\bigcup_{j \in I} B_{j} \,\middle\vert\, J \subset \{1, \ldots, k\}\right\}.
$$
We will use this collection to approximate any Borel set. For each $A \in \mathcal{A}, \partial A \subset \partial B_{1} \cup \cdots \cup \partial B_{k}$, so $\mu(\partial A) \leq$ $\mu\left(\partial B_{1}\right)+\cdots+\mu\left(\partial B_{k}\right)=0$. By Lemma 1, $\mu_{i}(A) \rightarrow \mu(A)$ for all $A \in \mathcal{A}$. Fix $N$ such that
$$
\left|\mu_{i}(A)-\mu(A)\right|<\delta \quad \forall i \geq N, \forall A \in \mathcal{A}.
$$
In particular,
$$
\mu_i \left(\bigcup_{j=1}^{k} B_{j}\right) \ge \mu \left(\bigcup_{j=1}^{k} B_{j}\right) -\delta \ge \mu(X) - 2 \delta \quad \forall i \ge N.
$$
Now we fix a Borel set $B$ and approximate it by
$$
A := \bigcup \{B_j \mid j = 1,\ldots,k \text{ such that } B_j \cap B \neq \emptyset\}.
$$
Then
- $A \subset B_{\delta} := \{x \mid d(x, B)<\delta\}$ because $\operatorname{diam} B_{j}<\delta$,
- $B=\left[B \cap \bigcup_{j=1}^{k} B_{j}\right] \cup\left[B \cap\left(\bigcup_{j=1}^{k} B_{j}\right)^{c}\right] \subset \left [ A \cup\left(\bigcup_{j=1}^{k} B_{j}\right)^{c} \right ]$,
- $\left|\mu_{i}(A)-\mu(A)\right|<\delta$ for all $i \geq N$, and
- $\mu\left(\left(\bigcup_{j=1}^{k} B_{j}\right)^{c}\right) \leq \delta$ and $\mu_{i}\left(\left(\bigcup_{j=1}^{k} B_{j}\right)^{c}\right) \leq \mu_i(X)-\mu(X)+ 2 \delta \le \mu_i(X) + 3\delta$ for all $i \geq N$.
It follows that for every $i \geq N$ :
\begin{aligned}
\mu(B) & \leq \mu(A)+\mu\left(\left(\bigcup_{j=1}^{k} B_{j}\right)^{c}\right) \\
& \leq \mu(A)+\delta \\
& \leq \mu_{i}(A)+2 \delta \\
& \leq \mu_{i}\left(B_{\delta}\right)+2 \delta \\
&\leq \mu_{i}\left(B_{\varepsilon}\right)+\varepsilon \\
\mu_{i}(B) & \leq \mu_{i}(A)+\mu_{i}\left(\left(\bigcup_{j=1}^{k} B_{j}\right)^{c}\right) \\
&\leq \mu_{i}(A)+ 3 \delta \\
&\leq \mu(A)+4 \delta \\
& \leq \mu\left(B_{\delta}\right)+4 \delta \\
&\leq \mu\left(B_{\varepsilon}\right)+\varepsilon.
\end{aligned}
This is true for every $B \in \mathcal{B}$, so $d_{P}\left(\mu_{i}, \mu\right) \leq \varepsilon$ for all $i \geq N$.
Best Answer
Lemma 1: If $X$ is separable, then convergence in $d_P$ is equivalent to weak convergence.
Lemma 2: Let $\mu, \mu_1,\mu_2,\ldots \in \mathcal M$. Then $\mu_i \to \mu$ weakly if and only if $\int f \mathrm d \mu_i \to \int f \mathrm d \mu$ for all uniformly continuous and bounded functionals $f$.
Notice that $x \mapsto \delta_{x}$ is a homeomorphism from $X$ onto $\left\{\delta_{x} \mid x \in X\right\}$. If $\mathcal{M}$ is separable, then so is $\left\{\delta_{x} \mid x \in X\right\}$ and thus is $X$. Let's prove the other direction. Let $D$ be a countable dense subset of $X$. Let $$ \mathcal D := \{\alpha_1 \delta_{a_1} + \cdots + \alpha_k \delta_k \mid a_1, \ldots, a_k \in D \text{ and }\alpha_1,\ldots, \alpha_k \in \mathbb Q_{\ge 0}\} $$
Then $\mathcal D$ is countable. Let's prove that $\mathcal D$ is dense in $\mathcal{M}$. Fix $\mu \in \mathcal{M}$. For each $m\ge 1$, we pick $k_m$ such that $$ \mu \left ( \bigcup_{j=1}^{k_m} B(a_j, 1/m) \right ) \ge \mu(X)-1/m. $$
Let $A_{1}^{m} := B\left(a_{1}, 1 \right)$ and $$ A_{j}^{m} := B\left(a_{j}, 1 / m\right) \setminus \bigcup_{i=1}^{j-1} B\left(a_{i}, 1 / m\right) \quad \forall j=2, \ldots, k_{m}. $$
Then $(A_j^m)_{j=1}^{k_m}$ is disjoint, and their union is equal to $\bigcup_{i=1}^{k_m} B\left(a_{i}, 1 / m\right)$ for all $m\ge 1$. In particular, $$ \mu(X) \ge\sum_{j=1}^{k_{m}} \mu\left(A_{j}^{m}\right) \ge \mu(X)-1/m \quad \forall m \ge 1. $$
We approximate $$ \mu\left(A_{1}^{m}\right) \delta_{a_{1}}+\cdots+\mu\left(A_{k_{m}}^{m}\right) \delta_{a_{k_{m}}} \quad \text{by} \quad \mu_{m}:=\alpha_{1}^{m} \delta_{a_{1}}+\cdots+\alpha_{k_{m}}^{m} \delta_{a_{k_{m}}} $$ such that $\alpha_{1}^{m}, \ldots, \alpha_{k_{m}}^{m} \in \mathbb Q_{\ge 0}$ and $$ \sum_{j=1}^{k_{m}} \left|\mu\left(A_{j}^{m}\right)-\alpha_{j}^{m}\right|<2 / m. $$
Let $g$ be a uniformly continuous and bounded functional on $X$. By Lemmas 1 and 2, we need to prove $\int g \mathrm d \mu_m \to \int g \mathrm d \mu$ as $m \to \infty$. In deed, $$ \begin{align} & \left |\int g \mathrm d \mu_m - \int g \mathrm d \mu \right | \\ = &\left | \sum_{j=1}^{k_{m}}\alpha_j^m g(a_j) - \int g \mathrm d \mu \right| \\ \le & \left | \sum_{j=1}^{k_{m}} \mu\left(A_{j}^{m}\right) g(a_j) - \int g \mathrm d \mu \right| + \frac{2}{m} \sup_j | g(a_j) | \\ \le& \left | \int \sum_{j=1}^{k_{m}} g(a_j) 1_{A_j^m} \mathrm d \mu - \int g \mathrm d \mu \right| + \frac{2}{m} \|g\|_\infty \\ =& \left | \int \sum_{j=1}^{k_{m}} [g(a_j) -g] 1_{A_j^m} \mathrm d \mu + \int g 1_{(\bigcup_{j=1}^{k_{m}} A_j^m)^c} \right| + \frac{2}{m} \|g\|_\infty\\ \le& \sum_{j=1}^{k_{m}} \int | g(a_j) -g| 1_{A_j^m} \mathrm d \mu + \|g\|_\infty \mu \left ( \left (\bigcup_{j=1}^{k_{m}} A_j^m \right )^c \right ) + \frac{2}{m} \|g\|_\infty\\ \le& \sum_{j=1}^{k_{m}} \sup_{x\in A_j^m} | g(a_j) -g(x)| \mu (A_j^m) + \frac{1}{m} \|g\|_\infty + \frac{2}{m} \|g\|_\infty. \end{align} $$
Each $A_{j}^{m}$ is contained in a ball with radius $1 / m$ around $a_{j}$. Since $g$ is uniformly continuous, for every $\varepsilon>0$ there is a $\delta>0$ such that $|g(y)-g(x)|<\varepsilon$ whenever $d(x,y)<\delta$, so $\left|g(a_{j}) - g(x)\right|<\varepsilon$ for all $j$ and $x \in A_{j}^{m}$. Then for $m$ such that $1/m < \min\{\varepsilon, \delta\}$, it follows from the above computation that $$ \left|\int g \mathrm d \mu_{m}-\int g \mathrm d \mu\right| \leq \varepsilon \mu(X) + \frac{3}{m} \|g\|_{\infty}. $$ This completes the proof.