Since $\|A\|_{op}\le \|A\|_F=\sqrt{\operatorname{tr}(A^{\top}A)}$,
$$
\mathsf{E}\|A\|_{op}\le \mathsf{E}\sqrt{\sum_{i=1}^n\sum_{j=1}^m A_{ij}^2}\overset{(1)}{\le} \sqrt{\sum_{i=1}^n\sum_{j=1}^m \mathsf{E}A_{ij}^2}\overset{(2)}{\le} 2\sigma\sqrt{n\times m},
$$
where $(1)$ follows from Jensen's inequality and $(2)$ follows from Theorem 2.1.1 in the lecture notes.
To get a tighter bound involving $\sqrt{n+m}$, one may use the following tail bound (Theorem 4.4.5 in Vershynin, R., High-Dimensional Probability):
$$
\mathsf{P}(\|A\|_{op}>CK(\sqrt{n+m}+t))\le e^{-t^2}, \quad t>0,
$$
where $C>0$ is a constant and $K\equiv\inf\{t>0:\mathsf{E}\exp(A_{11}^2/t^2)\le 2\}$.
Using this bound
\begin{align}
\mathsf{E}\|A\|_{op}&=\int_0^{\infty}\mathsf{P}(\|A\|_{op}>t)dt\le CK\sqrt{n+m}+CK\int_{\sqrt{n+m}}^{\infty}e^{-\frac{(t-\sqrt{n+m})^2}{2}}dt \\
&=CK\left(\sqrt{n+m}+\sqrt{\pi/2}\right)\le C'K\sqrt{n+m}.
\end{align}
Finally, $K$ can be found using the Orlicz condition in Theorem 2.1.1 in the lecture notes.
A short answer is: you don't need additional assumption. To elaborate, in order to deduce $\mathbb{E}[\sup_{t, s\in T\colon d(t, s)<\delta}X_t - X_s]\to 0$ as $\delta\to 0$, I think separability for the process suffices.
Recall that we say a process is separable if there exists a countable $T_0 \subseteq T$ such that (outside a null set) for all $t \in T$, there exists $(t_n)$ in $T_0$ such that $d(t_n, t) \to 0$ satisfying $X_{t_n} \to X_t$.
This is the basic assumption we need for Dudley's entropy bound, and it basically gives $\sup_{t\in T_0} X_t = \sup_{t \in T} X_t$. Here, the metric space can be a pseudo-metric space, i.e., it needs not satisfy $d(x, y) > 0$ for $x \neq y$.
On the other hand, from what I understand, the logic is clearer if you consider the converse: Firstly, the latter bound is just the usual Dudley integral entropy bound:
(Dudley entropy bound) Let $\{X_t\}_{t\in T}$ be a centered and separable sub-Gaussian process on $(T, d)$ w.r.t. $d$. Then,
$$
\mathbb{E}\left[\sup_{t\in T} X_t\right]
\leq C\int_{0}^{\mathrm{diam}(T)}\sqrt{\log N(T, d, \epsilon)}\,\mathrm{d}\epsilon.
$$
In view of this, by triangle inequality, you get the bound for $X_t - X_s$ with a doubled constant $2C$. For what I know, $C=12$, but anyway it doesn't really matter.
You can then obtain the finite-resolution form of the above, i.e., separate the above bound by truncating the integral into $\int_{0}^{\delta} + \int_{\delta}^{\mathrm{dim}(T)}$. To be more specific, you will need to consider a minimal $\delta$-net $\mathcal{N}$ and decompose $X_t - X_s$ into
$$
X_t - X_s
= X_t - X_{\pi(t)} + X_{\pi(t)} - X_{\pi(s)} + X_{\pi(s)} - X_s,
$$
where $\pi(x)$ is the nearest element of $x$ in $\mathcal{N}$. You can further bound the right-hand side in the form of $2\sup_{t, s \in T\colon d(t, s)\leq \delta} (X_t - X_s) + \sup_{t', s' \in \mathcal{N}} (X_{t'} - X_{s'})$, which probably is a standard argument also used in your textbook. Then, you handle the second term by the usual Dudley's entropy bound with truncation. This gives what you have initially.
Best Answer
I think you can imitate the proof of Theorem 1.19 from your notes. Apologies if my approach is a little clumsy.
One can show that $\|A\| = \sup_{|u|_2 \le 1, |v|_2 \le 1} u^\top A v$. Then $E\|A\| = E[ \sup_{|u|_2\le 1, |v|_2 \le 1} u^\top A v]$.
One can obtain an $1/2$-net $\mathcal{N}^n$ over $\mathcal{B}_2^n$ with $6^n$ points. Similarly one obtains a $1/2$-net $\mathcal{N}^m$ over $\mathcal{B}_2^m$ of size $6^m$.
So writing $$u^\top A v = (u-x)^\top A (v-y) + x^\top A v + u^\top A y - x^\top A y$$ where $x \in \mathcal{N}^n$, $y \in \mathcal{N}^m$, and $|x-u|_2 \le 1/2$ and $|y-v|_2 \le 1/2$ yields $$E[\sup_{u \in \mathcal{B}_2^n, v \in \mathcal{B}_2^m} u^\top A v] \le E[\sup_{x \in \mathcal{N}^n, y \in \mathcal{N}^m} x^\top A y] + E[\sup_{x \in \mathcal{N}^n, v \in \mathcal{B}_2^m/2} x^\top A v] + E[\sup_{u \in \mathcal{B}_2^n/2, y \in \mathcal{N}^m} u^\top A y] + E[\sup_{u \in \mathcal{B}_2^n/2, v \in \mathcal{B}_2^m/2} u^\top A v]. $$ Rearranging leads to $$\frac{3}{4} E[\sup_{u \in \mathcal{B}_2^n, v \in \mathcal{B}_2^m} u^\top A v] \le E[\sup_{x \in \mathcal{N}^n, y \in \mathcal{N}^m} x^\top A y] + E[\sup_{x \in \mathcal{N}^n, v \in \mathcal{B}_2^m/2} x^\top A v] + E[\sup_{u \in \mathcal{B}_2^n/2, y \in \mathcal{N}^m} u^\top A y].$$
The first term on the right-hand side is the maximum of $6^{n+m}$ sub-Gaussian random variables with variance proxy $\sigma^2$, so it is $\le \sigma \sqrt{2 (m+n) \log 6}$.
I believe you can bound the other two terms by doing a further net argument and obtaining the same $c \sigma \sqrt{m+n}$ rate. Finally $\sqrt{m+n} \le \sqrt{m} + \sqrt{n}$.