You're asking about maximal inequalities. These are known in more generality for measure-preserving transformations. As has already been pointed out, in your special case, you can expect to get a constant bound. The averages very quickly approach the limit, so you're looking at the average of the max of the first few terms together with the limiting value.
In general, for measure-preserving transformations, if the $X_k$ are just $L^1$ (even in the iid case), the expectation of the supremum can diverge logarithmically. If the $X_k$ have some higher moments: $L^p$ for any $p>1$, then you get the bound $\|M((X_k))\|_p \le C_p \|X_1\|_p$, where $C_p$ is some constant with a $p-1$ in the denominator.
For your first question, Rigollet has a series of notes(with minor typos) that dicusses basics of this kind of tail bounds. The result you mentioned in the "introduction" is actually the classic Hoeffding bound that mentioned by Rigollet in this set of notes.
High Dimensional Statistics, Philippe Rigollet (2015)
http://www-math.mit.edu/~rigollet/PDFs/RigNotes15.pdf
If you are concerned with the order statistics, you usually want to restrict yourself to a narrower class of distributions, say the classic paper [2].
For a general distribution family, it is almost impossible to obtain a tail-bound on the order statistics due to the concentration of measures phenomenon. Another keyword you may want to look into is "U-statistics" because the (full) order statistic is an example of U-statistics, the following quote from [1] is the best description of the power of U-statistics
Theoretically, for these U-statistics we can study the whole spectrum
of asymptotic problems which were investigated for independent
variables. As a matter of fact, it is necessary to control the nature
of dependence in order to obtain meaningful results. We restrict
ourselves to exchangeable and weakly exchangeable variables, rank
statistics, samplings from finite populations, weakly dependent random
variables, bootstrap-variables, and to order statistics.[1]p.15
Concentration bounds on U-statistics is a research subject that involves many false claims and hardcore techniques, so I think it is not too hard to find one but never too careful to use one.
Update in response to update3.
Why is the maximal uniform spacing so small in magnitude compared to the maximal oscillation in the empirical distribution?
This is not something surprising. you can always have very large spacings but small Kolomogorov-Smirnov norm, which measures on the space of $\mathcal{M}(\mathbb{R})$ while the spacing is measuring $\mathbb{R}$. If you look into Luc's argument, you will see his comments on it.
For the simplest example, given a set of data $\{0,0.1,0.9\cdots,0.9\text{(n repeated 0.9)},1\}$ and $\{0,0.1\cdots,0.1\text{(n repeated 0.1)},0.9,1\}$, their empirical measures has completely different cdfs but they both have the same maximal spacing of $0.8$. as the number $n$ of repeated observation increase, the KS norm between these two cdfs can be arbitrarily close to 1.
Reference
[1]Korolyuk, Vladimir S., and Yu V. Borovskich. Theory of U-statistics. Vol. 273. Springer Science & Business Media, 2013.
[2] Gupta, S. Das, et al. "Inequalities on the probability content of convex regions for elliptically contoured distributions." Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability (Univ. California, Berkeley, Calif., 1970/1971). Vol. 2.1972.
Best Answer
This is a slightly more elementary version of Anthony Quas's answer, without appealing to martingales.
By the condition $\sum_j|a_j|^2<\infty$, the sequence of random variables $X_n:=\sum_{j=-n}^n a_j\epsilon_{-j}$ is Cauchy convergent in $L^2$ and hence in $L^1$. So, $X_n$ converges to a limit $X(0)$ in $L^1$. Since $EX_n=0$ for all $n$, we have $EX(0)=0$.
(In view of Kolmogorov's maximal inequality, $X_n\to X(0)$ almost surely as well.)