Those four statements are indeed quite different! Towards unpacking their differences:
a) Notice that the processes in statements 1 and 3 are (essentially) the same-- they constitute at each point $\omega/t$ the partial sum of the observations that has jumps at the points of the form $i/n$. These are both basically then considering the partial sum process as an element of $D[0,1]$, the space of cadlag functions on $[0,1]$. Statement 1 is stronger than statement 3-- while statement 3 is basically saying that the distribution of the partial sum process is close to that of a Brownian motion, statement 1 is saying that there exists a copy of the original partial sum process, defined on a potentially new probability space, and Brownian motions defined on the same space, that are close in probability. As such, and it is a worthwhile exercise to consider, statement 1 can be used to prove statement 3 relatively easily, but not the other way around. Statement 1 belongs to a family of approximation results for stochastic processes known as "weak approximations", have a look at the Skorokhod-Dudley-Wichura theorem, and see https://encyclopediaofmath.org/wiki/Skorokhod_theorem. Note that while it seems kinda weird that all random variables must potentially be redefined on a new probability space, the necessity of doing so is for a very simple and understandable reason: the original sample space for the observations may simply not be rich enough to support a Brownian motion. Skorokhod's original proof works by constructing all variables on the sample space $(0,1)$ equipped with Lebesgue measure.
b) Statement two considers a modified partial sum process that, rather than having jumps, is continuously interpolated using linear interpolation. The processes in statement 1/3 and 2 agree on the points of the form $i/n$. The point of considering this process rather than the one in statement 1/3 is basically for mathematical convenience-- it takes values in the space $C[0,1]$ of continuous functions, which is a complete and separable metric space when equipped with the sup-norm $\|x-y\|=\sup_{t\in [0,1]}|x(t)-y(t)|$. Separability is a key tool in establishing many asymptotic results for measures defined on metric spaces. The space $D[0,1]$ equipped with the sup-norm is NOT separable. As developed in Chapter 3 of Billingsly's 1968 book, a metric on $D[0,1]$ can be defined, this is called the Skorokhod metric, making $D[0,1]$ separable, and such that many functionals of statistical/probabilisitc interest on $D[0,1]$ are continuous with respect to that metric, thereby circumventing the need to transform the partial sum process into $C[0,1]$, which admittedly is kind of clunky.
An even more slick way of handling this has been developed more recently, which is sometimes called weak convergence in Hoffman-Jorgensen sense. Basically in this case weak convergence is defined using outer expectation, and so processes that are not continuous, such as the standard partial sum process, can have their weak convergence considered in the metric space $C[0,1]$, since the weak limit, a Brownian motion, lives in this space. This theory is comprehensively developed in Vaart, Aad van der; Wellner, Jon A. Weak convergence and empirical processes.
c) Statement 4 is a statement about weak convergence of the standard empirical process, which is analogous to statement 3 for the partial sum process. Donsker's original papers on the topic consider these two cases separately, and the development of results in this vein since then have often followed this pattern.
This answer uses the the following fact.
If $X \sim \Gamma(\alpha,\lambda)$ and $Y \sim \Gamma(\beta,\lambda)$ are independent, then $X+Y \sim \Gamma(\alpha+\beta,\lambda)$.
Hints: Let $(Y_i)_{i \in \mathbb{N}}$ be a sequence of independent identically distributed random variables such that $Y_i \sim \Gamma(\alpha,\lambda)$.
- Show that $\tilde{X}_n := \sum_{i=1}^n Y_i$ satisfies $\tilde{X}_n \sim \Gamma(n \alpha,\lambda)$.
- Apply the central limit theorem to prove that $$\frac{1}{\sqrt{n}} \left( \tilde{X}_n - \frac{n \alpha}{\lambda} \right) \stackrel{d}{\to} Z$$ for $Z \sim N(0,\sigma^2)$ with $\sigma^2 = \text{var}(Y_1)$; here $\stackrel{d}{\to}$ denotes convergence in distribution.
- Use the fact that $\tilde{X}_n$ equals in distribution $X_n$ for each $n \in \mathbb{N}$ to conclude from Step 2 that $$\frac{1}{\sqrt{n}} \left( X_n - \frac{n \alpha}{\lambda} \right) \stackrel{d}{\to} Z$$
- Compute $\sigma^2 = \text{var}(Y_1)$ (...or look it up, e.g. on wikipedia).
Best Answer
The frame of the CLT is that one considers sums of i.i.d. random variables with fixed distribution. When the number $n$ of summands becomes large, after centering (by the mean) and scaling (by $1/\sqrt{n}$), the limit in distribution is normal. The frame described in the revised version of the question is quite different, since one considers binomial distributions $B(n,p_n)$ when $n$ becomes large (just like before) and $p_n$ becomes small (quite different from before), and moreover, when $np_n\to\lambda$ for some positive $\lambda$.
In other words, now the distribution of the individual summands changes with $n$. No wonder the results are different! In the latter case, the limit in distribution (without centering nor scaling) is Poisson (with parameter $\lambda$), a result which is sometimes called the law of rare events.
Edit: The new new version of the question concerns the generating function of the random variable $\xi_n=(S_n-m_n)/\sigma_n$ where $S_n$ is $B(n,p_n)$, $m_n=\mathbb E(S_n)$ and $\sigma_n^2=\mathrm{var}(S_n)$. Since the presentation of this context in the question seems to be rather confused, it may be useful to recall that $m_n=np_n$ and $\sigma_n^2=np_n(1-p_n)$ hence, in the asymptotics considered here, $m_n\to\lambda$ and $\sigma_n^2\to\lambda$. In particular, neither $m_n$ nor $\sigma_n^2$ is linear in $n$. Keeping this in mind, you might try to expand anew the generating function of $\xi_n$.