Any field that involves some sort of processes with noise, uncertainty or probabilistic behavior will make use of concepts from stochastic processes.
Practically speaking, I've seen the associated theory applied to spacecraft dynamics (in the context of attitude estimation), celestial mechanic (in the context of tracking), meteorological phenomena (in the context of data assimilation), analytical mechanics (in the context of vibrational behavior), and in electrical engineering (in the context of computer vision and stochastic signals).
These methods can also be important in data science/machine learning, although I have no experience with either.
Often times, the ideas from stochastic processes are used in estimation schemes, such as filtering. These are method which are used to propagate the moments of a probabilistic dynamical system. Since many systems can be probabilistic (or have some associated uncertainty), these methods are applicable to a varied class of problems.
When framing some problems, quantities which in reality are deterministic, can be treated as probabilistic. This is sometimes done in parameter estimation (or system identification, which is similar), which seeks to estimate some deterministic parameter based on some input data.
You can look up more references based on the fields and contexts, or ask me, but here are two examples as a proof of concept:
First, it is important to notice that (not unusually) Roger and Williams have, at the point you are looking at, restricted their integrator $M$ to be a continuous $L^2$-bounded martingale. This means that at this stage, Brownian motion is not an allowed integrator, since it is not $L^2$-bounded. The punchline will be that by localisation, once we have the theory for $L^2$-bounded martingales, we will be able to get it for a wider class that includes Brownian motion.
This is important since it means that the class of martingales they consider satisfies $\mathbb{E}([M]_\infty) < \infty$. This will imply that constant functions are in $L^2(M)$, for example.
Now for the main body of your question. Let $U$ be the set of elementary processes in $L^2(M)$ (coinciding with the notation of Rogers/Williams). We want to show that $\overline{U} = L^2(M)$.
First we check that the space of bounded previsible process $b\mathcal{E}$ satisfies $b \mathcal{E} \subseteq \overline{U}$.
For this, we use their lemma 6.5 (the monotone class theorem). First, constant functions are in $\overline{U}$ since $c 1_{[0,n]} \to c$ in $L^2(M)$ as $n \to \infty$.
For the second condition, for arbitrary $\varepsilon$ we can pick an $N$ large enough that $|H_n(s,\omega) - H(s,\omega)| \leq \varepsilon$ for all $n \geq N$, $s \in (0,\infty)$ and $\omega \in \Omega$. Hence
$$\|H_n - H\|_{L^2(M)}^2 \leq \mathbb{E} \bigg [ \int_0^\infty \varepsilon d [M]_s \bigg] = \varepsilon \mathbb{E}([M]_\infty).$$ That is, the uniform convergence implies convergence in $L^2(M)$ so $H \in \overline{U}$ also since $\overline{U}$ is closed.
Finally, it is an easy exercise to use the DCT to see that $\overline{U}$ satisfies condition $3$ in Lemma 6.5 also.
Therefore, $b \mathcal{E} \subseteq \overline{U}$. You are right that general elements of $L^2(M)$ need not be bounded so we aren't quite finished. The point is that a general element of $L^2(M)$ can be approximated by functions in $b \mathcal{E}$.
The crudest way one might try to do this is to take $\phi \in L^2(M)$ and just brutally cut it off at height $N$. That is, let
$$\phi_N = \begin{cases} N \qquad |\phi| \geq N \\
\phi \qquad \text{otherwise} \end{cases}$$
Then $\phi_N$ is certainly bounded and it follows by the DCT that $\phi_N \to \phi$ in $L^2(M)$. Hence $L^2(M) \subseteq \overline{b \mathcal{E}} \subseteq \overline{U}$ as desired.
Best Answer
Various academics have lists on their website
Other resources that might be of interest