The proof of the spectral theorem for normal operators doesn't rely on the proof of the spectral theorem for self-adjoint operators, instead the proofs are basically identical.
How do you construct the spectral measure in the self-adjoint case? One way to do it is to look at the $C^*$-algebra generated by the self-adjoint operator $T$ on the Hilbert space $X$, let's call it $C^*(T)$. Since $C^*(T)$ is commutative, by Gelfand theory it is isomorphic to the algebra of continuous functions on the spectrum of $T$, $C(\sigma(T))$. Given $x,y\in H$, the map $C^*(T)\to\mathbb C$ given by $S\mapsto \langle Sx,y\rangle$ is a bounded linear functional, hence defines a Borel measure $\mu_{x,y}$ on $\mathbb R$, supported in $\sigma(T)$. Using these measures, we can extend the isomorphism $C(\sigma(T))\to C^*(T)$ to a homomorphism of $B(\mathbb R)\to \mathcal B(X)$ from the algebra bounded Borel functions on $\mathbb R$ to bounded operators on $X$. The spectral measure is just the restriction of this homomorphism to characteristic functions of Borel sets.
If now $T$ is normal, $C^*(T)$ is still commutative, and (again by Gelfand theory) is isomorphic to $C(\sigma(T))$, where now $\sigma(T)\subset\mathbb C$. Given $x,y\in X$, the measure $\mu_{x,y}$ is now a Borel measure on $\mathbb C$ supported in $\sigma(T)$, and in this way we obtain a homomorphism $B(\mathbb C)\to\mathcal B(X)$ from the algebra of bounded Borel functions on $\mathbb C$ to $\mathcal B(X)$, and obtain the spectral measure.
The rest of the proof of the spectral theorem should be the same.
EDIT
Hopefully this will help translate my response to language you are familiar with.
Firstly, yes, $C^*(T)$ is as you have defined it.
Secondly, basically the only difference between the two cases is that if $T$ is normal, we define the map $\Phi_0$ from polynomials in two variables $p=p(z,\overline z)$ to $B(X)$ by $\sum_{ij}a_{ij}z^i\overline z^j\mapsto \sum_{ij}a_{ij}T^i(T^*)^j$ and extend this by Stone-Weierstrass to a map $\Phi:C(\sigma(T))\to B(X)$. We need to consider bivariate polynomials in the normal case because if the set $X\subset\mathbb C$ is not a subset of $\mathbb R$, polynomials in one variable are not closed under conjugation, hence the Stone-Weierstrass theorem cannot be applied.
Thirdly, there are plenty of books out there that prove the spectral theorem for normal operators, leaving the case for self-adjoint operators as a corollary, but most of the one's I'm familiar with develop some basic $C^*$-algebra theory to make the proofs more transparent. See for instance Conway's or Rudin's functional analysis books, or Murphy's $C^*$-algebras and operator theory.
I would suggest using the following way more straightforward definition of joint spectrum. Consider $\mathcal A=C^*(N_1,\ldots,N_p)$. Then $\mathcal A$ is an abelian C$^*$-algebra, so $\mathcal A\simeq C(X)$ where $X$ is the character space of $\mathcal A$. The joint spectrum is then the set
$$
\sigma(N_1,\ldots,N_p)=\{((\gamma(N_1),\ldots,\gamma(N_p)):\ \gamma\in X\}\subset\prod_{k=1}^p\sigma(N_k).
$$
It follows from the definitions that $\sigma(N_1,\ldots,N_p)$ is homeomorphic to $X$ (as in the single variable case), which gives $\mathcal A\simeq C(\sigma(N_1,\ldots,N_p))$. One can then consider the representation $\rho:C(\sigma(N_1,\ldots,N_p))\to B(H)$ induced by $\rho(z_k)=N_k$. By Theorem IX.1.14 in Conway's A Course in Functional Analysis there exists a spectral measure $E$ on the Borel sets of $\sigma(N_1,\ldots,N_p)$ such that
$$
\rho(f)=\int_{\sigma(N_1,\ldots,N_p)}f\,dE.
$$
I think it's almost impossible for the joint spectrum to be equal to the Cartesian product of the spectra. The joint spectrum is more akin as a "curve" (in the $p=2$ case).
Finally, I cannot make sense of your last question. A single operator will have a a spectral measure on a compact subset of $\mathbb C$, whereas the one constructed here is defined over a compact subset of $\mathbb C^p$.
Best Answer
If $N$ is bounded and normal with spectral measure $E$, then $$ N_{\epsilon}=\int_{|\lambda|\ge\epsilon}zdE(z)+\int_{|\lambda| < \epsilon}\epsilon dE(z) $$ is invertible for $\epsilon > 0$ with inverse $$ N_{\epsilon}^{-1}=\int_{|\lambda|\ge \epsilon}\frac{1}{z}dE(z)+\frac{1}{\epsilon}\int_{|\lambda| < \epsilon} dE(\lambda). $$ And, $$ \|N-N_{\epsilon}\|=\|\int_{|\lambda|<\epsilon}(\epsilon-z)dE(z)\| \le 2\epsilon \|E\{ z: |z|<\epsilon\}\| \le 2\epsilon. $$