So let's start a step back because your coherent states are not normalized as I would normalize them.
Coherent states
The coherent states come from their response to the bosonic annihilator, $$\hat b |x , y\rangle = (x + i y) |x,y\rangle.$$From this one can derive that any particular one's representation among the number states must satisfy, $$\hat b~\sum_n c_n |n\rangle = \sum_n c_n \sqrt{n} |n-1\rangle=(x + i y) \sum_n c_n |n\rangle,$$giving the recursive relation that $c_n = \frac{x+iy}{\sqrt n}~c_{n-1}.$ Starting from $c_0$ we then find indeed the relation that $$|x,y\rangle = c_0~\sum_n \frac{(x + i y)^n}{\sqrt{n!}} |n\rangle.$$The remaining $c_0$ with the proper normalization gives $$\langle x,y|x,y\rangle = 1 = |c_0|^2 \sum_n \frac{(x-iy)^n(x+iy)^n}{n!} = |c_0|^2 \exp\big(x^2 + y^2\big).$$Choosing these to all have the same complex phase for their vaccum component finally yields,$$|x, y\rangle = \exp\left(-\frac12(x^2 + y^2)\right)\sum_n \frac{(x + i y)^n}{\sqrt{n!}}~|n\rangle.$$
So the question is, why does your expression have a leading $\pi^{-1/2}$ in it? That's because they resolve the identity in a somewhat weird way. What does that mean?
Resolving the identity
Suppose you have an expression for some average $\langle A \rangle.$ QM is very clear that this expression may be written based on its quantum state $|\psi\rangle$ as $\langle \psi|\hat A|\psi\rangle.$
But using the fact that $1 = \sum_n |n\rangle\langle n|,$ for example, we can insert these sums ad-hoc into that expression to find that in fact this expectation value also reads, $$\langle A \rangle = \sum_{mn} \langle\psi|m\rangle\langle m|\hat A|n\rangle\langle n|\psi\rangle = \sum_{mn} \psi^*_m~A_{mn}~\psi_n.$$ So that is the value of resolving the identity; it means that you can define this matrix $A_{mn}$ which fully specifies the action of $\hat A$ on the Hilbert space, recovering every single expectation value from the matrix.
Well we see something very similar when we look at the operator, $$\hat Q = \int_{-\infty}^\infty dx~\int_{-\infty}^\infty dy~|x,y\rangle\langle x, y| = \sum_{mn} \iint dx~dy~e^{-x^2-y^2}\frac{(x-iy)^m(x+iy)^n}{\sqrt{m!n!}} |m\rangle\langle n|.$$
At this point it is useful to shift to polar coordinates where $x + i y = r e^{i\theta},$ yielding $$\hat Q = \sum_{mn}\int_{0}^\infty dr~\int_0^{2\pi} r~d\theta~e^{-r^2}~\frac{r^{m+n} e^{i(n-m)\theta}}{\sqrt{m!n!}} |m\rangle\langle n|.$$ Note that the angle over $\theta$ integrates a sinusoid over one or more full periods and therefore vanishes if $m\ne n$; it is $2\pi$ if $m = n$, so we
must get:$$\hat Q = \pi\sum_{n}\int_{0}^\infty dr~2r~e^{-r^2}~\frac{r^{2n} }{n!} |n\rangle\langle n|.$$Substituting $u=r^2, du=2r~dr$ we find that this is:$$\hat Q = \pi\sum_{n}\frac1{n!}~|n\rangle\langle n|~\int_{0}^\infty du~e^{-u}~u^n.$$If you've never seen the gamma function before, the integral on the right hand side is $n!$ and in fact it is the canonical way to extend the factorial function to non-integers to find e.g. that $(-1/2)! = \sqrt{\pi},$ though of course we only need the integers here. After cancelling that through we find out that in fact, $$\hat Q = \pi,$$ or in other words we recover this property of resolving the identity even though not all of these functions are orthogonal, because the way that they're non-orthogonal just comes down to a constant multiplicative factor. We can therefore state unequivocally, $$1 = \iint dx~dy~\frac1\pi~|x,y\rangle\langle x,y|.$$ Your expression absorbs a $1/\sqrt{\pi}$ term into each of these kets, and writes $\pi^{-1/2} |x, y\rangle = |\alpha\rangle$ (where $\alpha = x + i y$) for short, both of which help in writing these expansions. One then finds similarly to the above expression with $A_{mn}$, that $$\langle A \rangle = \iint d^2\alpha~d^2\beta~\psi^*(\alpha)~A(\alpha,\beta)~\psi(\beta).$$The only cost to this notation is that we then have to express the above integrals with the more clumsy $\int d^2\alpha$ which is short for something like $d\alpha_x~d\alpha_y$ where $\alpha = \alpha_x + i \alpha_y.$
The spectral theorem is that, if $A: D(A) \to {\cal H}$ is a selfadjoint operator, where $D(A) \subset {\cal H}$ is a dense subspace, then there exists a unique projector-valued measure $P^{(A)}$ on the Borel sets of $\mathbb{R}$
such that $$A = \int_{\mathbb R} \lambda dP^{(A)}(\lambda)\:.$$
As a consequence (this is a corollary or a definition depending on the procedure)
$$f(A) = \int_{\mathbb R} f(\lambda) dP^{(A)}(\lambda) \tag{1}$$
for every $f: {\mathbb R} \to {\mathbb C}$ Borel measurable. Taking $f(x) =1$ for all $x\in {\mathbb R}$ we have
$$I = \int_{\mathbb R} dP^{(A)}(\lambda)\:.$$
For selfadjoint operators admitting a Hilbert basis of eingenvectors $\psi_{\lambda, d_\lambda}$, $\lambda \in \sigma_p(A)$ and $d_\lambda$ accounting for the dimension of the eigenspace with eigenvalue $\lambda$, the identity above reads (referring to the strong operator-topology)
$$f(A) = \sum_{\lambda, d_\lambda} f(\lambda) |\psi_{\lambda, d_\lambda}\rangle\langle \psi_{\lambda, d_\lambda} |\:, \tag{2}$$
with the special case
$$I = \sum_{\lambda, d_\lambda} |\psi_{\lambda, d_\lambda}\rangle\langle \psi_{\lambda, d_\lambda} |\:. \tag{3}$$
In summary Eqs.(1) and (2) are the central identities, Eq.(3) is just a special case.
Given an orthonormal complete basis $\{\psi_n\}_{n \in \mathbb N} \subset {\cal H}$, one can always define ad hoc a selfadjoint operator $A$ (with no physical meaning in general) to implement the identities above:
$$A = \sum_{n \in \mathbb{N}} \lambda_n |\psi_{n}\rangle\langle \psi_{n} |$$ for a given arbitrary choice of real numbers $\lambda_n$.
The domain of $A$ is
$$\left\{\psi \in {\cal H} \: \left| \: \sum_{n} |\lambda_n|^2 |\langle \psi_n| \psi \rangle|^2 < +\infty\right. \right\}$$
Best Answer
I will address the second question primarily, because there is an important point here about the notation.
Right from the beginning, the following statement $$\hat{p} = -i\hbar \partial_x$$ doesn't makes sense, especially when you then go to try and act on abstract ket vectors. The operator $\hat{p}$ is an abstract operator acting on abstract ket-space whose representation in position-space is given by $\frac{\hbar}{i}\frac{\partial}{\partial x}$. You cannot act with $\frac{\hbar}{i}\frac{\partial}{\partial x}$ on kets because these objects live in different worlds.
To explain, note that if we have an abstract ket (quantum state) $\lvert \psi\rangle$, we can compute its expansion coefficients in a basis $\lvert \varphi_n\rangle$ by by computing the inner products $\langle \varphi_n | \psi\rangle$. Then, you can think of $\langle \varphi_n | \psi\rangle$ as the $n$'th element in a column-vector representation of $\lvert \psi\rangle$, i.e., $$ \lvert{\psi}\rangle \to \begin{bmatrix} \langle \varphi_1 | \psi\rangle \\ \langle \varphi_2 | \psi\rangle \\ \langle \varphi_3 | \psi\rangle \\ \vdots \end{bmatrix}\,. $$
In roughly the same way, we can expand the vector in the position eigenbasis (i.e., $\{\lvert x\rangle\}$) by computing the inner products $\langle x | \psi\rangle$. Very roughly speaking, you can think of $\langle x | \psi\rangle$ as an element in a column-vector representation of $\lvert \psi\rangle$, just as above. The problem is that we have continually-many $x$'s, and so a column-vector representation (to the extent that it even makes sense) is computationally intractable. For that reason, we think of $\langle x | \psi \rangle$ as a function $\psi(x) = \langle x | \psi \rangle$ and call it the wave function, although we should really be thinking about it as the position-space representation of the quantum state $\lvert \psi\rangle$.
Now, in order to act with $\hat{p}$, we need to (1) know how $\hat{p}$ acts on the arbitrary state $\lvert\psi\rangle$ (which is usually not possible), (2) work in momentum space, or (3) work in position-space. The latter is what is relevant here. It turns out that the proper way to understand things is as follows: $$ \langle x \lvert \hat{p} \rvert \psi \rangle = \frac{\hbar}{i}\frac{\partial}{\partial x}\psi(x)\,. $$ In words, the position space representation of the state $\hat{p}\lvert \psi\rangle$ arrived at by acting with the momentum operator on the quantum state $\lvert \psi\rangle$ is exactly the derivative of $\psi(x)$, up to some constants. We can derive this by appealing to option (2) above, working the momentum basis and using the commutation relation $\hat{x}$ and $\hat{p}$ or, equivalently, using the Fourier relationship between momentum-space and position-space.
In any case, all of that is to say that something like $$ \hat{p} = (-i\hbar \partial_x)(\sum_{a'}\vert a' \rangle \langle a' \vert) = -i\hbar(\sum_{a'}\vert a' \rangle \partial_x \langle a' \vert)$$ doesn't make sense, because you are mixing two different representations together, which makes things fall apart.
In the derivation of the propagator done in Sakurai, that basis $\{\lvert a \rangle\}$ is assumed to be the eigenbasis of the Hamiltonian $\hat{H}$, and the abstract operator $\hat{H}$ is allowed only to act on kets $\lvert a\rangle$ and never on the dual vectors $\langle a\rvert$ or anything else. Once the matrix elements $$ \langle a'' \lvert e^{-i \hat{H}t/\hbar} \rvert a' \rangle $$ have been computed, then they can be moved anywhere, because they are numbers (complex numbers, sure, but still numbers) and not operators.
To parallel that discussion with the momentum operator, consider the eigenbasis $\{\lvert p \rangle\}$ of the momentum operator, such that $$ \hat{p}\lvert p \rangle = p \lvert p \rangle\,. $$ Then, to figure out what $\hat{p}$ looks like explicitly in terms of its eigenbasis, we can write \begin{align} \hat{p} &= \left( \int_{-\infty}^{\infty}dp'\,\lvert p' \rangle\langle p' \rvert \right) \hat{p}\left( \int_{-\infty}^{\infty}dp\,\lvert p \rangle\langle p \rvert \right) \\ &= \int_{-\infty}^{\infty}dp'\,\lvert p' \rangle\langle p' \rvert \int_{-\infty}^{\infty}dp\,\hat{p}\lvert p \rangle\langle p \rvert \\ &= \int_{-\infty}^{\infty}dp\int_{-\infty}^{\infty}dp'\, \lvert p' \rangle \langle p' \rvert\hat{p}\lvert p \rangle \langle p \rvert\,, \end{align} and since $$ \langle p' \rvert\hat{p}\lvert p \rangle = \langle p' \rvert {p}\lvert p \rangle = p\langle p' | p \rangle = p\delta(p-p')\,, $$ this becomes \begin{align} \hat{p} &= \int_{-\infty}^{\infty}dp\int_{-\infty}^{\infty}dp'\, \lvert p' \rangle p\delta(p-p') \langle p \rvert \\&= \int_{-\infty}^{\infty}dp\,p \lvert p \rangle \langle p \rvert\,. \end{align}
Alternatively, if we want to see what $\hat{p}$ looks like explicitly in the position representation, we would use two resolutions of the identity in terms of the position eigenstates instead, and use the fact that $$ \langle x' \lvert \hat{p} \rvert x\rangle = \delta(x-x') \frac{\hbar}{i}\frac{\partial}{\partial x}\,, $$ a derivation that you'll see in Sakurai, and in the process of the derivation, you'll see that $\frac{\hbar}{i}\frac{\partial}{\partial x}$ will only be acting on objects that look like $\langle x | \psi \rangle$ and not on kets or bras or anything else.
(Note that the last equation is equivalent with the statement that $$ \hat{p} = \int_{\infty}^{\infty}dx\,\lvert x\rangle \frac{\partial}{\partial x}\langle x \rvert\,, $$ where the $\frac{\partial}{\partial x}$ is understood to act to the right, usually after we have applied this to a ket $\lvert \psi\rangle$.