It is a good idea to first consider bounded operators, or even operators on finite-dimensional Hilbert spaces.
In this case, the spectrum is discrete (there are only eigenvalues, no continuous spectrum), let us denote the eigenvalues by $\lambda_n$.
In Physicists' terms, the projection-valued measure is then simply
$$ \mathrm d\mu^A(\lambda) = \sum_n \delta(\lambda - \lambda_n)\, |\lambda_n\rangle\langle\lambda_n|\, \mathrm d\lambda $$
and the expression you have given reduces down to the familiar
$$ A = \sum_n \lambda_n\, |\lambda_n\rangle\langle\lambda_n| . $$
We see that $A$ is diagonal in the basis of eigenvectors.
The integral $\int_{\sigma(A)} \lambda\,d\mu^A(\lambda)$ generalizes this to a scenario with operators with continuous spectrum. In the case of continuous spectrum, $\int_a^{a+\epsilon} \mathrm d\mu^A(\lambda)$ goes to zero (i.e., the zero-operator) for $\epsilon \to 0$, just like with any "normal" integral with non-atomic measures (no delta functions).
I am not so familiar with direct integrals, but it is probably helpful to also think about this formulation in the case of a finite-dimensional space.
Edit in response to the comment
Suppose we can rewrite some bounded self-adjoint operator in terms of an integral taken with respect to a projective measure, so what? How does that really help us or show that our operator is "diagonalizable" in some sense?
In a finite-dimensional system, expanding an operator $A$ in the form $A = \sum_n \lambda_n\, |\lambda_n\rangle\langle\lambda_n|$ is the diagonalization, expressed in a basis-independent way. You can immediately see that the matrix representation of $A$ in the $|\lambda_n\rangle$-basis is diagonal, and what the eigenvalues and eigenvectors are. What I was trying to explain above is that the expansion $A = \int_{\sigma(A)} \lambda\,d\mu^A(\lambda)$ generalizes that, so $A$ is diagonalizable in this sense.
It is useful mostly in the same way that the expansion $A = \sum_n \lambda_n\, |\lambda_n\rangle\langle\lambda_n|$ is useful in finite dimensions. For example, we can easily calculate functions $f(A)$:
$$ f(A) = \int_{\sigma(A)} f(\lambda)\,d\mu^A(\lambda). $$
Separability is not a necessary physical requirement, at least in modern approaches. First of all, all mathematical technology, as the spectral theory, is valid both for separable or non-separable Hilbert spaces.
The Hilbert space of a quantum system turns out to be separable under some quite standard circumstances however. In particular, when it is the representation space of a strongly continuous unitary irreducible representation of a (finite dim) Lie group. This is the case for every elementary system. The group is the Poincaré one, the Weyl-Heisenberg one or more complicated groups including some also non-abelian internal symmetries. The Fock space constructed upon that space is separable as well by construction.
An apparent physical consequence of non separability is that, a bounded below Hamiltonian, even if equipped with pure point spectrum only, cannot produce thermal mixed states of the usual form $e^{-\beta H}/Z_\beta$, since these operators cannot be trace class. However there is the way out of the algebraic formalism to describe thermal states in that case, using KMS algebraic states.
The structure of Fock space is independent of the notion of eigenvector of a Hamiltonian. For instance, the symmetric Fock space is nothing but the direct orthogonal sum of all possible symmetrized tensor products of the one particle space. It does not matter if there is a basis made of eigenvectors of one or another Hamiltonian indicated with the popular notation.
ADDENDUM Separability plays however a role in motivating the structure of QM from more basic principles. It takes place in the hypotheses of the Soler theorem and the Gleason theorem in particular. But QM as a whole does not need that hypothesis to be stated.
Best Answer
A quick review, for those who are less familiar with the text. At it's core, a physical theory is a mechanism for assigning probabilities to the possible outcomes of experiments. More specifically, given an $\mathbb R$-valued observable $\mathscr O$ and a (Borel-measurable) set $E\subseteq \mathbb R$, we may ask for the probability that we measure $\mathscr O$ to take its value in $E$.
In the standard formulation of quantum mechanics on an $n$-dimensional Hilbert space $\mathscr H$, we model an observable via a self-adjoint operator $\hat{\mathscr O}$. The possible outcomes of an ideal measurement correspond to the operator's spectrum $\sigma\big(\hat{\mathscr O}\big)$, which consists of the eigenvalues of $\hat{\mathscr O}$. The spectral theorem tells us that $\hat{\mathscr O}$ induces a splitting of the Hilbert space $$\mathscr H = \bigoplus_{i=1}^K V_i = V_1\oplus \ldots\oplus V_K$$ where $V_i$ is the $i^{th}$ eigenspace of $\hat{\mathscr O}$, $K$ is the number of distinct eigenvalues, and $V_i\perp V_j$ for all $i\neq j$. As a result, any vector $\psi\in \mathscr H$ can be uniquely written as $\psi = \sum_{i=1}^K \psi_i$ where $\hat{\mathscr O}\psi_i = \lambda_i \psi_i$. It is then a postulate of the theory that if the state of the system is represented by a normalized vector $\psi$, the probability of measuring $\mathscr O$ to take the value $\lambda_i$ is $\Vert \psi_i \Vert^2$.
To make this framework cleaner, we define the projection-valued measure $\pi$ which eats a Borel-measurable set $E\subseteq \mathbb R$ and spits out the projector onto the direct sum of eigenspaces whose eigenvalues lie in $E$. For example: $$\hat{\mathscr O}=\pmatrix{1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 2}$$ $$\pi\big(\{1\}\big) = \pmatrix{1&0&0\\0&0&0\\0&0&0} \quad \pi\big(\{2\}\big) = \pmatrix{0&0&0\\0&1&0\\0&0&1} \quad \pi\big(\{1,2\}\big) = \pmatrix{1&0&0\\0&1&0\\0&0&1}$$ $$\pi\big(\{-7\}\big) = \pmatrix{0&0&0\\0&0&0\\0&0&0}$$ This allows us to define the spectral decomposition $\hat{\mathscr O}=\sum_{\lambda\in \mathbb R} \lambda \cdot \pi\big(\{\lambda\}\big)$. It also allows us to answer our motivating question in a straightforward way: for a state represented by a normalized vector $\psi$, the probability of measuring $\mathscr O$ to take its value in a Borel-measurable set $E\subseteq \mathbb R$ is simply $$\mathrm{Prob}_\psi(E) = \langle \psi, \pi(E) \psi\rangle$$
Self-adjoint operators come with a canonical set of orthogonal projectors which send vectors in $\mathscr H$ to the various eigenspaces of the operator. We use these projectors to extract the individual $\psi_i$'s from the decomposition $\psi = \sum_{i=1}^K \psi_i$, and they are orthogonal because the distinct eigenspaces of a self-adjoint operator are orthogonal.
In infinite-dimensional spaces, we run into the possibility that the spectrum of the operator in question has a continuous component. In this case, we must turn to the more sophisticated tools of functional analysis; however, if we are willing to play with the idea of generalized (non-normalizable) eigenvectors, then the only real change is that the spectral decomposition will include an integral over the continuous spectrum as well as a sum over the discrete spectrum. However, since the spirit of the answer doesn't really change, I don't think it's necessary to make this explicit. If the spectrum consists purely of a discrete (but infinite) set of eigenvalues, then everything written above stays essentially the same, with the sums extended to infinity.