Those four statements are indeed quite different! Towards unpacking their differences:
a) Notice that the processes in statements 1 and 3 are (essentially) the same-- they constitute at each point $\omega/t$ the partial sum of the observations that has jumps at the points of the form $i/n$. These are both basically then considering the partial sum process as an element of $D[0,1]$, the space of cadlag functions on $[0,1]$. Statement 1 is stronger than statement 3-- while statement 3 is basically saying that the distribution of the partial sum process is close to that of a Brownian motion, statement 1 is saying that there exists a copy of the original partial sum process, defined on a potentially new probability space, and Brownian motions defined on the same space, that are close in probability. As such, and it is a worthwhile exercise to consider, statement 1 can be used to prove statement 3 relatively easily, but not the other way around. Statement 1 belongs to a family of approximation results for stochastic processes known as "weak approximations", have a look at the Skorokhod-Dudley-Wichura theorem, and see https://encyclopediaofmath.org/wiki/Skorokhod_theorem. Note that while it seems kinda weird that all random variables must potentially be redefined on a new probability space, the necessity of doing so is for a very simple and understandable reason: the original sample space for the observations may simply not be rich enough to support a Brownian motion. Skorokhod's original proof works by constructing all variables on the sample space $(0,1)$ equipped with Lebesgue measure.
b) Statement two considers a modified partial sum process that, rather than having jumps, is continuously interpolated using linear interpolation. The processes in statement 1/3 and 2 agree on the points of the form $i/n$. The point of considering this process rather than the one in statement 1/3 is basically for mathematical convenience-- it takes values in the space $C[0,1]$ of continuous functions, which is a complete and separable metric space when equipped with the sup-norm $\|x-y\|=\sup_{t\in [0,1]}|x(t)-y(t)|$. Separability is a key tool in establishing many asymptotic results for measures defined on metric spaces. The space $D[0,1]$ equipped with the sup-norm is NOT separable. As developed in Chapter 3 of Billingsly's 1968 book, a metric on $D[0,1]$ can be defined, this is called the Skorokhod metric, making $D[0,1]$ separable, and such that many functionals of statistical/probabilisitc interest on $D[0,1]$ are continuous with respect to that metric, thereby circumventing the need to transform the partial sum process into $C[0,1]$, which admittedly is kind of clunky.
An even more slick way of handling this has been developed more recently, which is sometimes called weak convergence in Hoffman-Jorgensen sense. Basically in this case weak convergence is defined using outer expectation, and so processes that are not continuous, such as the standard partial sum process, can have their weak convergence considered in the metric space $C[0,1]$, since the weak limit, a Brownian motion, lives in this space. This theory is comprehensively developed in Vaart, Aad van der; Wellner, Jon A. Weak convergence and empirical processes.
c) Statement 4 is a statement about weak convergence of the standard empirical process, which is analogous to statement 3 for the partial sum process. Donsker's original papers on the topic consider these two cases separately, and the development of results in this vein since then have often followed this pattern.
I've taken a look at the book and this chapter is introductory and somewhat informal, so I imagine the authors are more specific about what they mean by a white noise in space or time and what they mean by the S(P)DE in your question in later chapters. Nevertheless, I have addressed aspects of your question below.
A discussion of Definitions 2, 3 and 5 are contained in an answer of mine to a similar question here. Everything in that answer is real-valued (which hopefully doesn't make too much of a difference) and indexed by a single real variable (or more precisely a test function of a single real variable); this can make a significant difference depending on what you want to know.
Definition 2
The random distribution that acts on $\phi$ via $(W, \phi) = \int \phi(t) B_t dt$ is just the Brownian motion $B$ (i.e. we can identify the function $B$ with the distribution $W$).
Your definition of $W'$ is then how I define white noise (denoted $X$) in the answer linked to above: white noise $X$ is defined as the random distribution that acts on a test function $\phi$ by $(X, \phi) = -\int_0^\infty B(t) f'(t) dt$. In the parlance of the book you cite, this is a white noise in time (time is the only variable in that answer). However, you can generalize this definition to white noise in space and time (see the discussion of Definition 3 below).
Definition 3
Here $W$ is your white noise (not $W'$ as in Definition 2).
To link this to definition 2, set $d = 0$ (so there is no spatial component to the domain of $\phi$). With $X$ defined as above, $(X_\phi := (X, \phi) : \phi \in C^\infty([0, \infty))$ is a centered Gaussian process with covariance $E(W_\phi W_\psi) = (\phi, \psi)_{L^2}$ (by the Ito isometry). The definition you have stated is a generalization to the case where the process is indexed by space and time (more precisely by test functions of space and time).
Definition 5
Your definition of $W$ is the same (by stochastic integration by parts) as the definition of $X$ above. Thus, $W$ here is once again white noise ($W'$ is then the distributional derivative of white noise).
Definition 1
In this definition, while the realization of the process you get in this way depends on the choice of basis, its (probability) distribution is independent of basis. You can think of a white noise as any process with this distribution.
This definition must be understood in the sense of distributions (now referring to Schwartz distributions) as white-noise is not defined pointwise (so $W_t$ is meaningless). A more precise definition is that $W$ acts on a test function $\phi$ by $W_\phi := (W, \phi) = \sum_{i=1}^\infty \xi_i (\phi, \phi_i)$. Now you can check that $W_\phi$ has mean $0$ and that
\begin{equation}
E(W_\phi W_\psi) = E\sum_{i=1}^\infty \xi_i \xi_j (\phi, \phi_i) (\psi, \phi_j) = \sum_{i=1}^\infty (\phi, \phi_i) (\psi, \phi_j) = (\phi, \psi)_{L^2}.
\end{equation}
Thus, the only thing to check to see that $W$ has the same distribution as the processes above is that it is Gaussian.
Best Answer
Let $$\frac{\Sigma_d}{\operatorname{Tr}\Sigma_d} = \mathrm{diag}(a_1, a_2, \cdots, a_d).$$
By Jensen's inequality, we have $$\mathbb{E}\left[\frac{1}{\|x\|^4}\right] \ge \frac{1}{(\mathbb{E}[\|x\|^2])^2} = 1. \tag{1}$$
Using the known identity ($q > 0$) $$\frac{1}{q^2} = \int_0^\infty t\, \mathrm{e}^{-tq} \, \mathrm{d} t,$$ we have \begin{align*} \mathbb{E}\left[\frac{1}{\|x\|^4}\right] &= \mathbb{E}\left[ \int_0^\infty t\,\mathrm{e}^{-t\|x\|^2} \,\mathrm{d} t \right]\\ &= \int_0^\infty t\, \mathbb{E}[\mathrm{e}^{-t\|x\|^2}]\, \mathrm{d} t \\ &= \int_0^\infty t\, \prod_{i=1}^d \mathbb{E}[\mathrm{e}^{-tx_i^2}]\, \mathrm{d} t\\ &= \int_0^\infty t\, \prod_{i=1}^d \frac{1}{\sqrt{1 + 2a_i t}} \,\mathrm{d} t \\ &= \int_0^\infty t\, \mathrm{e}^{-\frac12\sum_{i=1}^d \ln (1 + 2a_i t)} \,\mathrm{d} t \tag{2} \end{align*} where we use $\mathbb{E}[\mathrm{e}^{-tx_i^2}] = \int_{-\infty}^\infty \mathrm{e}^{-ty^2}\cdot \frac{1}{\sqrt{2\pi a_i}}\mathrm{e}^{-\frac{y^2}{2a_i}}\,\mathrm{d} y = \frac{1}{\sqrt{1 + 2a_i t}}$.
Fact 1: $\ln(1 + 2a_i t) \ge \frac{1}{i}\ln (1 + 2a_1 t)$ for all $i$ and all $t \ge 0$.
(Proof: Note that $a_i = a_1/i$. Let $f(t) = \ln(1 + 2a_i t) - \frac{1}{i}\ln (1 + 2a_1 t)$. We have $f'(t) = \frac{4a_1^2 t (i - 1)}{i(2a_1t + i)(2a_1 t + 1)} \ge 0$. Also, $f(0) = 0$. The desired result follows.)
Denote $H_d = \sum_{i=1}^d \frac{1}{i}$. By Fact 1, we have $$-\frac12\sum_{i=1}^d \ln (1 + 2a_i t) \le -\frac12 \ln(1 + 2a_1 t) \cdot \sum_{i=1}^d \frac{1}{i} \le - \frac12 H_d\ln(1 + 2t/H_d). \tag{3}$$
From (3), we have $$ \int_0^\infty t\, \mathrm{e}^{-\frac12\sum_{i=1}^d \ln (1 + 2a_i t)} \,\mathrm{d} t \le \int_0^\infty t\, \mathrm{e}^{- \frac12 H_d\ln(1 + 2t/H_d)} \,\mathrm{d} t. \tag{4} $$
From (1) and (2) and (4), we have $$1\le \mathbb{E}\left[\frac{1}{\|x\|^4}\right] \le \int_0^\infty t\, \mathrm{e}^{- \frac12 H_d\ln(1 + 2t/H_d)} \,\mathrm{d} t. $$
By the Dominated Convergence Theorem, we have $$\lim_{d\to \infty} \int_0^\infty t\, \mathrm{e}^{- \frac12 H_d\ln(1 + 2t/H_d)} \,\mathrm{d} t = 1.$$ (Note: Note that $u \mapsto u \ln (1 + 2t/u)$ is non-decreasing on $u > 0$. Let $f_d(t) = t\, \mathrm{e}^{- \frac12 H_d\ln(1 + 2t/H_d)}$. Then $f_2(t) \ge f_3(t) \ge \cdots $, and $\lim_{d\to \infty} f_d(t) = t\mathrm{e}^{-t}$, and $\int_0^\infty f_m(t)\,\mathrm{d} t < \infty$ for some $m$. See: Monotone Convergence theorem for decreasing sequence)
Thus, we have $$\lim_{d\to \infty} \mathbb{E}\left[\frac{1}{\|x\|^4}\right] = 1.$$
We are done.