Your answer depends, as you guessed on the process $u \mapsto \sigma_u$.
You can amplify your approach using the Ito-isometry with the BDG-inequality:
\begin{align}
\mathbb{E}[|\int_0^t\sigma_udW_u - & \int_0^s\sigma_u \,dW_u|^{2p}] \stackrel{\text{BDG}}{\leq} c(p) \mathbb{E}[(\int_s^t |\sigma_u|^2 \, du)^p] \\
& \leq c(p) \mathbb{E}[(t-s)^{p-1} \int_s^t |\sigma_u|^{2p} \, du] \quad (\text{Hölder-ineq. on} \int_s^t)\\
& = c(p) (t-s)^{p-1} \int_s^t \mathbb{E} [|\sigma_u|^{2p} ]\, du \, .
\end{align}
Here $(t-s)^{p-1}$ appears from the use of the Hölder inequality with the integrand $1$. Now you can think of specifying some integrability properties of $u\to \sigma_u$: bounded, uniformly in $L^{2p}$, etc. and continue with the Kolmogorov-Centsov theorem.
I think, however, as you asked for SDE, it will suffice to consider that situation:
$$ dX_t = \sigma(X_t) \, dW_t $$
where $\sigma:\mathbb{R} \to \mathbb{R}$ and $W$ is a standard Brownian motion. If we assume a further condition, called the linear growth condition on $\sigma$:
$$ |\sigma(x)| \leq c_1 (1+ |x|)\, , \quad x \in \mathbb{R} \, ,$$
your claim on almost Hölder $\frac{1}{2}$-regularity is true. In order to ensure existence of a global solution (i.e. for all times $t\geq 0$) this condition is very natural.
Explanation:\
You need control over $\sup_{s\leq u \leq t} \mathbb{E}[|\sigma(X_u)|^{2p}]$ in that case.
Assuming this growth condition, one has the following estimate
$$ \mathbb{E}[|\sigma(X_u)|^{2p}] \leq (2c_1)^{2p} (1+ \mathbb{E}[|X_u|^{2p}] ) . $$
So we need the $p$-th moment of $X_u$. We can obtain a bound on that again using BDG-inequality:
\begin{align}
\mathbb{E} [|X_u|^{2p}] & = \mathbb{E} [ |\int_0^u \sigma(X_v) \, dW_v |^{2p}] \\
& \leq c(p) \mathbb{E}[ (\int_0^u |\sigma(X_v)|^2 \, dv)^{p} ] \\
& \leq c(p) \mathbb{E} [u^{p-1} \int_0^u |\sigma(X_v)|^{2p} \, dv ] \quad (\text{Hölder-ineq. on} \int_0^u) \\
& \leq c(p) c_1^{2p}u^{p-1} \mathbb{E} [ \int_0^u (1+|X_v|)^{2p} \, dv ]\\
& \leq c(p) (2c_1)^{2p} u^{p-1} (u + \int_0^u \mathbb{E} [|X_v|^{2p}] \,dv ).
\end{align}
Now you need to apply Gronwall's lemma to obtain a bound C(c,c_1,c_0 ,p, u) on $\mathbb{E} [|X_u|^{2p}]$ with the property that $C(c,c_1,c_0,p, u) \leq C(c,c_1,c_0,p, t)$ for $u \leq t$. Then you can continue:
\begin{align}
\mathbb{E}[|X_t - X_s|^{2p}] & \leq c(p) (t-s)^{p-1} \int_s^t C(c,c_1,c_0,p,u) \, du \\
& \leq c(p) (t-s)^{p-1} \int_s^t C(c,c_1,c_0,p,t) \, du \\
& \leq c(p) C(c,c_1,c_0,p,t) (t-s)^p.
\end{align}
This allows you to apply Kolmogorov-Centsov.
This also works fine for SDEs including a drift term $+b(X_t)\, dt$.
As you ask for references: the main source of the description is an article by Dalang, however on SPDE-regularity: Theorem 13 in http://ejp.ejpecp.org/article/view/43/85
and also the appendix in Mytnik, Perkins and Sturm,2006.
It is not too unlikely that this kind of calculation can also be found in a textbook on stochastic analysis.
Best Answer
Edit: The question has been edited to ask for something stronger which this answer doesn't give. I'm leaving this here in case it inspires someone else/I work out a way to improve it to give the stronger claim.
Since $0 \leq 1_M \leq 1$, we really want to show that we can pick $\delta > 0$ to guarantee that for all $s \in [0,T]$ $$\mathbb{E}\bigg[\max_{s \le t \le (s+\delta) \wedge T} \bigg| \int_s^t \sigma_u \mathrm dW_u \bigg| \bigg] \le \varepsilon.$$
We can do this using an appropriate form of Doob's Martingale inequality. We have \begin{align} \mathbb{E}\bigg[\max_{s \le t \le (s+\delta) \wedge T} \bigg| \int_s^t \sigma_u \mathrm dW_u \bigg| \bigg] \leq& \mathbb{E}\bigg[\max_{s \le t \le (s+\delta) \wedge T} \bigg| \int_s^t \sigma_u \mathrm dW_u \bigg|^2 \bigg]^{\frac12} \\ \leq& 2\mathbb{E} \bigg[ \bigg(\int_s^{(s+ \delta) \wedge T} \sigma_u dW_u \bigg)^2 \bigg]^{\frac12} \\ =& 2\mathbb{E} \bigg[ \int_s^{(s+ \delta) \wedge T} \sigma_u^2 du \bigg]^{\frac12} \end{align} where the first inequality is just monotonicity of $L^p$-norms on probability spaces, the second is Doob's inequality and the last line then follows by the Ito isometry. It's clear that your assumptions on $\sigma$ let you choose $\delta$ to make this last quantity as small as you like, giving the desired result.