Of course, there are many ways of metrizing the weak topology on $\mathcal M(\Omega)$ by using various tools of functional analysis. However, as it has already been pointed out by Dan, the most natural way is to use the transportation metric on the space of measures. [It is much more natural than the Prokhorov metric. I don't want to go into historical details here - they can be easily found elsewhere, but I insist that the transportation metric should really be related with the names of Kantorovich (in the first place) and his collaborator Rubinshtein]. Dan gives its dual definition in terms of Lipschitz functions, however its "transport definition" is actually more appropriate here. Let me remind it.
Given two probability measures $\mu_1,\mu_2$ on $\Omega$
$$
\overline d(\mu_1,\mu_2) = inf_M \int d(x_1,x_2) dM(x_1,x_2) \;,
$$
where $d$ is the original metric on $\Omega$, and the infimum (which is in fact attained) is taken over all probability measures $M$ on $\Omega\times\Omega$ whose marginals ($\equiv$ coordinate projections) are $\mu_1$ and $\mu_2$. One should think about such measures as "transportation plans" between distributions $\mu_1$ and $\mu_2$, while the integral in the RHS of the definition is the "cost" of the plan $M$.
It is obvious that the above definition makes sense not just for probability measures, but for any two positive measures $\mu_1,\mu_2$ with the same mass. Moreover, $\overline d(\mu_1,\mu_2)$ actually depends on the difference $\mu_1-\mu_2$ only, so that one can think about it as a "weak norm"
$$
|||\mu_1-\mu_2||| = \overline d(\mu_1,\mu_2)
$$
of the signed measure $\mu_1-\mu_2$ (clearly, it is homogeneous with respect to multiplication by scalars).
Let now $\mu=\mu_1-\mu_2$ be an arbitrary signed measure, where $\mu_1,\mu_2$ are the components of its Hahn decomposition. The only reason why the definition of the weak norm does not work in this situation is that the measure $\mu$ need not to be "balanced" in the sense that the total masses $\|\mu_1\|$ and $\|\mu_2\|$ need not be the same any more. However, this can be easily repaired in the following way: extend the original space $\Omega$
to a new metric space $\Omega'$ by adding to it an "ideal point" $o$ and putting $d(\omega,o)=1$ for any $\omega\in\Omega$. Then the measure
$$
\mu'=\mu - (\|\mu_1\|-\|\mu_2\|)\delta_o \;,
$$
where $\delta_o$ is the unit mass at the point $o$, is now balanced, so that $|||\mu'|||$ is well defined. Therefore, one can extend the definition of the weak norm $|||\cdot|||$ to arbitrary signed measures $\mu$ by putting
$$
|||\mu|||=|||\mu'||| \;.
$$
It is now easy to see that the distance $|||\mu_1-\mu_2|||$, where $\mu_1,\mu_2$ are two arbitrary signed measures, metrizes the weak topology on $\mathcal M(\Omega)$.
A proof along the lines that you describe was worked out by V.S. Sunder here.
There are a few different approaches that you can take to reduce the Riesz Representation Theorem to a class of simpler spaces. For compact spaces, you can either reduce to compact metric spaces like Sunder does or directly to projective (i.e. extremally disconnected) compact spaces like Carothers.
Sunder's reduction to the compact metric case relies on the fact that every Baire set in a compact space is a continuous preimage of a Baire set in $[0, 1]^\mathbb{N}$. From here, you have two choices. You can either reduce to the case of $2^\mathbb{N}$ as you describe, or you can work directly with $[0, 1]^\mathbb{N}$. Sunder does the former, but I think you should also be able to do the latter and use an explicit proof for $[0, 1]$, e.g. in terms of Bernstein polynomials.
Carothers takes a more direct approach. Given a compact space $X$, he considers the projective cover of $X$ given by the Stone-Čech compactification of the discretization of $X$, where the clopen sets generate the Baire sets and finite additivity on clopen sets implies countable additivity. I personally feel that this better motivates advanced mathematics than the metric space proof, since the use of projective covers of topological spaces corresponds to the use of injective envelopes of $C^*$-algebras.
To generalize from compact spaces to locally compact spaces, you have a few choices. You can use the fact that locally compact spaces are compactly generated, which is roughly what Sunder does. You can also use compactifications, e.g. the one-point or Stone-Čech compactifications.
The use of the one-point compactification corresponds to the unitization of $C(X)$ as a $C^*$-algebra, where positive linear functionals have unique extensions to unitizations by taking a limit over a contractive approximate identity, i.e. again by approximating from compact subsets. If you worked it out explicitly in terms of topological spaces, it would look pretty similar to Sunder's proof.
The approach using the Stone-Čech compactification is a bit more interesting because it generalizes to arbitrary completely regular spaces. The best treatment of this I have seen is Measures on Topological Spaces by J.D. Knowles. The approach there uses both the Markov-Alexandrov approach of representing linear functionals on $C_b(X)$ in terms of finitely additive measures, as well as the Riesz-Kakutani approach on $C(\beta X)$ in terms of countably additive measures, and then studies the relationship between finite additivity, countable additivity, and inner approximations by compact sets.
Best Answer
Theorem 1 is essentially the representation theorem (or the "representation definition" in Bourbaki) for the Alexandroff one point compactification of a T$_2$ locally compact space (the difference between continuous functions with value 0 at infinity and all continuous functions with any finite value at infinity does not change the dual, except for the delta measures concentrated on the point at infinity). And yes, the dual (for a compact T$_2$ space) is that of finite regular Borel measures or equivalently that of finite measures on the Baire $\sigma$-algebra, the minimum one that makes measurable all continuous functions (all such Baire measures are regular; Baire is smaller, being generated by the 0-sets of continuous functions i.e. closed=compact G$_\delta$ sets instead of all closed=compact sets).
It does not apply when there are not enough continuous (real / complex) valued functions, so you can apply it to a space whose associated T$_0$ is T$_2$ locally compact, but not in general. I think that Bourbaki has something in exercises for (possibly non T$_1$) quasi-compact and normal spaces.
To see easy examples of quasi-compact T$_0$ spaces with too few real or complex continuous functions, consider finite T$_0$ spaces, which are the same thing as finite posets (with principal order filters as basis for the topology).
Edit. Incorporating bathalf15320 absolutely correct comment: if you want the dual of the space of continuous bounded functions on an arbitrary space, you note that by Gelfand duality you are considering the maximum, universal compactification of the T$_0$ complete regularization of the initial space, hence the dual is given by the Radon measures on that compact T$_2$ space. But the universal compactification is much more difficult to understand than the one-point compactification (the minimum one, which exists only for locally compact T$_2$ spaces).
As for T$_2$, already the old general topology book by Kelly noted (essentially) that what one needs in analysis is not T$_2$, but that the associated T$_0$ space (identify points that are in the same closed sets i.e. in the same open sets; example: in spaces of measurable functions, identify functions a.e. equal; more generally, for topologies and uniformities defined by a family of pseudo-metrics, one identifies points at 0 distance for all such pseudo-metrics) has a series of properties, in ascending order:
T$_2$: equivalent to unicity of limits;
T$_3$ (regular and T$_0$): equivalent to the previous one plus "the unique possible way to extend by continuity a function (i.e. extend taking the limit) always works giving a continuous function"
T$_{3+1/2}$ (T$_0$ and completely regular: points and closed sets are separated by a continuous real valued function, equivalently: topology induced by a family of pseudo-metrics, equivalently: by a uniformity): equivalent to embeddable as subspace of a compact T$_2$ space (or also equivalent to "subspace of a normal T$_1$ space").
[Having more open sets than a T$_{3+1/2}$ topology is equivalent to "distinct points are separated by a continuous real valued function"; it is independent from T$_3$ and lies between T$_2$ and T$_3+1/2$, but usually when one talks about "sufficiently many continuous functions" means the stronger condition T$_3+1/2$.]
Now, sometimes one defines "locally compact" as "each point has a basis of quasi-compact closed neighborhoods", which is the same as "regular and each point has a quasi-compact neighborhood" or also "the associated T$_0$ space is locally compact T$_2$". [Kelly's book used very much the "regular" trick to avoid Hausdorff].