A simple way to do this is first create doubled word and then sort it, e.g.
$$\begin{aligned}
S &\to D_1D_2T \\
T &\to A_1A_2T \mid B_1B_2T \mid E \\
\alpha_2\beta_1 &\to \beta_1\alpha_2 \\
A_2E &\to E0 \\
B_2E &\to E1 \\
D_2E &\to F \\
A_1F &\to F0 \\
B_1F &\to F1 \\
D_1F &\to \varepsilon
\end{aligned}$$
for $\alpha, \beta \in \{A,B,D\}$. Intuitively, it first creates a starter mark $D_1$, then an end-mark of first word $D_2$, and then pairs of letters until the end $E$. The meta-rule sorts all $A_1$ before $A_2$, but without interchanging $A_1$ and $B_1$ or $A_2$ and $B_2$ (and same for $B$s and $D$s).
The $E$ stamp renders the $A_2$ and $B_2$ non-terminals and after reaching the middle it changes into $F$ that renders $A_1$ and $B_1$. Finally, after $F$ reaches the begging, it disappears.
Hope that helps ;-)
It’s minor sloppiness. Change the productions to
$$\begin{align*}
&S\to aTb\\
&S\to ab\\
&T\to aTb\\
&T\to ab\\
&S\to\lambda\;,
\end{align*}$$
and you have essentially the same grammar written in a form that meets the requirement.
Added: There are really two different notions involved here. Let $\Sigma$ be the set of terminal symbols and $\mathscr{N}$ the set of non-terminal symbols, and let $S$ be the initial symbol. A context-sensitive production is one of the form $\alpha A\beta\to\alpha\gamma\beta$, where $\alpha,\beta\in(\Sigma\cup\mathscr{N})^*$, $\gamma\in(\Sigma\cup\mathscr{N})^+$, and $A\in\mathscr{N}$; it gets its name because the replacement of $A$ by $\gamma$ occurs only in the context $\alpha\_\beta$. A monotonic production is one of the form $\alpha\to\beta$, where $\alpha,\beta\in(\Sigma\cup\mathscr{N})^*$ and $|\alpha|\le|\beta|$; it gets its name from the fact that the length of the string in a derivation is monotonically non-decreasing. It’s a theorem that a language has a grammar consisting entirely of context-sensitive productions if and only if it has a grammar consisting entirely of monotonic productions. Call such languages purely context-sensitive for the moment.
Unfortunately, these productions don’t allow us to generate any language that contains the empty word. We’d like to make the context-sensitive languages a superset of the context-free languages, some of which do contain the empty word. To do this, we allow the production $S\to\lambda$ provided that $S$ does not appear on the righthand side of any production. It’s a sort of one-time exception to allow us to generate the empty word; it’s not intended to let us generate anything else that wasn’t already able to be generated. In other words, we want to generate only purely context-sensitive languages and languages that would be purely context-sensitive if they didn’t include the empty word.
The reason for not allowing $S$ to appear on the righthand side of any production is that if we don’t make that restriction, we can write grammars that generate languages that would not be purely context-sensitive even if we threw away the empty word. In fact, we could generate every language that can be generated by any formal grammar whatsoever. In terms of the Chomsky hierarchy, we could generate not only all of the type $1$ languages, but also all of the type $0$ languages. The restriction ensures that we really do generate only the type $1$ languages, i.e., those that are purely context-sensitive and those that would be purely context-sensitive if they didn’t include the empty word.
Some grammars that violate the restriction are easily seen to be equivalent to grammars that do not violate it; that’s the case with the one that you asked about, as I showed by replacing it with the grammar above. When that can obviously done, the grammar is sometimes sloppily referred to as a context-sensitive grammar, even though technically it isn’t.
Best Answer
XML is a language defined by SGML, which is a restricted form of context free grammar (essentially a Dyck language with many types of parentesis). Such languages are (trivially) LL(1), thus very easy to parse.