I've taken a look at the book and this chapter is introductory and somewhat informal, so I imagine the authors are more specific about what they mean by a white noise in space or time and what they mean by the S(P)DE in your question in later chapters. Nevertheless, I have addressed aspects of your question below.
A discussion of Definitions 2, 3 and 5 are contained in an answer of mine to a similar question here. Everything in that answer is real-valued (which hopefully doesn't make too much of a difference) and indexed by a single real variable (or more precisely a test function of a single real variable); this can make a significant difference depending on what you want to know.
Definition 2
The random distribution that acts on $\phi$ via $(W, \phi) = \int \phi(t) B_t dt$ is just the Brownian motion $B$ (i.e. we can identify the function $B$ with the distribution $W$).
Your definition of $W'$ is then how I define white noise (denoted $X$) in the answer linked to above: white noise $X$ is defined as the random distribution that acts on a test function $\phi$ by $(X, \phi) = -\int_0^\infty B(t) f'(t) dt$. In the parlance of the book you cite, this is a white noise in time (time is the only variable in that answer). However, you can generalize this definition to white noise in space and time (see the discussion of Definition 3 below).
Definition 3
Here $W$ is your white noise (not $W'$ as in Definition 2).
To link this to definition 2, set $d = 0$ (so there is no spatial component to the domain of $\phi$). With $X$ defined as above, $(X_\phi := (X, \phi) : \phi \in C^\infty([0, \infty))$ is a centered Gaussian process with covariance $E(W_\phi W_\psi) = (\phi, \psi)_{L^2}$ (by the Ito isometry). The definition you have stated is a generalization to the case where the process is indexed by space and time (more precisely by test functions of space and time).
Definition 5
Your definition of $W$ is the same (by stochastic integration by parts) as the definition of $X$ above. Thus, $W$ here is once again white noise ($W'$ is then the distributional derivative of white noise).
Definition 1
In this definition, while the realization of the process you get in this way depends on the choice of basis, its (probability) distribution is independent of basis. You can think of a white noise as any process with this distribution.
This definition must be understood in the sense of distributions (now referring to Schwartz distributions) as white-noise is not defined pointwise (so $W_t$ is meaningless). A more precise definition is that $W$ acts on a test function $\phi$ by $W_\phi := (W, \phi) = \sum_{i=1}^\infty \xi_i (\phi, \phi_i)$. Now you can check that $W_\phi$ has mean $0$ and that
\begin{equation}
E(W_\phi W_\psi) = E\sum_{i=1}^\infty \xi_i \xi_j (\phi, \phi_i) (\psi, \phi_j) = \sum_{i=1}^\infty (\phi, \phi_i) (\psi, \phi_j) = (\phi, \psi)_{L^2}.
\end{equation}
Thus, the only thing to check to see that $W$ has the same distribution as the processes above is that it is Gaussian.
Let's consider the first and the second term on the right-hand side of your equation separately:
First term:
If $(X_t)_{t \geq 0}$ and $(Y_t)_{t \geq 0}$ are semimartingales, then
$$\int_0^T f(t) \, d(X_t-Y_t) = \int_0^T f(t) dX_t - \int_0^T f(t) dY_t$$
for any (nice) mapping $f$, and therefore
$$d(X_t-Y_t) = dX_t - dY_t.$$
This implies that
$$E(X_t)_t d(X_t-\tfrac{1}{2} \langle X \rangle_t) = E(X_t) \, dX_t - \frac{1}{2} E(X_t) \, d\langle X \rangle_t. \tag{1}$$
Second term: By assumption, $X_t = X_0 + M_t+A_t$ is a semimartingale, and this implies that $$Y_t := X_t - \frac{1}{2} \langle X \rangle_t = X_0 + \underbrace{M_t}_{\text{martingale part}} + \underbrace{\left( A_t - \frac{1}{2} \langle X \rangle_t \right)}_{\text{finite variation part}}$$ is also a semimartingale. By the very definition of the square bracket, $N_t = \langle Y \rangle_t$ is the unique finite variation process such that $M_t^2 - N_t$ is a continuous local martingale. Hence, $\langle Y \rangle_t = \langle X \rangle_t$, i.e.
$$\langle X- \tfrac{1}{2} \langle X \rangle \rangle_t = \langle X \rangle_t.$$
Consequently, we get for the second term in your equation that
$$\frac{1}{2} E(X_t) \, d\langle X- \tfrac{1}{2} \langle X \rangle \rangle_t = \frac{1}{2} E(X_t) \, d\langle X \rangle_t. \tag{2}$$
Combining $(1)$ and $(2)$ we find that
$$E(X_t)_t d(X_t-\tfrac{1}{2} \langle X \rangle_t) + \frac{1}{2} E(X_t) \, d\langle X- \tfrac{1}{2} \langle X \rangle \rangle_t = E(X_t) \, dX_t.$$
Best Answer
You should think of the white noise $X_t$ as the distributional derivative of a Brownian motion $B_t$. Formally, $X_t dt = dB_t$ and the integral you are trying to compute becomes $\int \phi(t) dB_t$. But this formal expression is given a rigorous meaning as the Ito integral of $\phi$ with respect to the Brownian motion $B$.
A well-known fact is that the Ito integral of a deterministic function in $L^2$ is always Gaussian. You can see this, for example, by computing its characteristic function. The fact that the variance is given by the squared $L^2$ norm is even easier to see and follows from the Ito isometry: \begin{equation} \mathbb E \left(\int \phi(t) \; dB_t\right)^2 = \mathbb E \int \phi(t)^2 \; dt = \int \phi(t)^2 \; dt = \|\phi\|^2_{L^2}. \end{equation}
Maybe see here for a more detailed discussion.