It’s minor sloppiness. Change the productions to
$$\begin{align*}
&S\to aTb\\
&S\to ab\\
&T\to aTb\\
&T\to ab\\
&S\to\lambda\;,
\end{align*}$$
and you have essentially the same grammar written in a form that meets the requirement.
Added: There are really two different notions involved here. Let $\Sigma$ be the set of terminal symbols and $\mathscr{N}$ the set of non-terminal symbols, and let $S$ be the initial symbol. A context-sensitive production is one of the form $\alpha A\beta\to\alpha\gamma\beta$, where $\alpha,\beta\in(\Sigma\cup\mathscr{N})^*$, $\gamma\in(\Sigma\cup\mathscr{N})^+$, and $A\in\mathscr{N}$; it gets its name because the replacement of $A$ by $\gamma$ occurs only in the context $\alpha\_\beta$. A monotonic production is one of the form $\alpha\to\beta$, where $\alpha,\beta\in(\Sigma\cup\mathscr{N})^*$ and $|\alpha|\le|\beta|$; it gets its name from the fact that the length of the string in a derivation is monotonically non-decreasing. It’s a theorem that a language has a grammar consisting entirely of context-sensitive productions if and only if it has a grammar consisting entirely of monotonic productions. Call such languages purely context-sensitive for the moment.
Unfortunately, these productions don’t allow us to generate any language that contains the empty word. We’d like to make the context-sensitive languages a superset of the context-free languages, some of which do contain the empty word. To do this, we allow the production $S\to\lambda$ provided that $S$ does not appear on the righthand side of any production. It’s a sort of one-time exception to allow us to generate the empty word; it’s not intended to let us generate anything else that wasn’t already able to be generated. In other words, we want to generate only purely context-sensitive languages and languages that would be purely context-sensitive if they didn’t include the empty word.
The reason for not allowing $S$ to appear on the righthand side of any production is that if we don’t make that restriction, we can write grammars that generate languages that would not be purely context-sensitive even if we threw away the empty word. In fact, we could generate every language that can be generated by any formal grammar whatsoever. In terms of the Chomsky hierarchy, we could generate not only all of the type $1$ languages, but also all of the type $0$ languages. The restriction ensures that we really do generate only the type $1$ languages, i.e., those that are purely context-sensitive and those that would be purely context-sensitive if they didn’t include the empty word.
Some grammars that violate the restriction are easily seen to be equivalent to grammars that do not violate it; that’s the case with the one that you asked about, as I showed by replacing it with the grammar above. When that can obviously done, the grammar is sometimes sloppily referred to as a context-sensitive grammar, even though technically it isn’t.
I think your solution is incomplete, how do you know that $w_1$ is in $L(G)$. Consider $w=aaabbb$,then $w1=aabbb$ which clearly is not in the language.
I will put my answer below, if you want to think more on it you can stop here.
Now on, by the property I mean :"every prefix of $w$ has at least as much $a$s as $b$s".
The proof must consist of two parts: 1:proving every string produced
by the grammar has the aforementioned property, 2:proving every string
with the property is produced by the grammar. This is the way I would
prove it.
For part one, suppose the property holds for all strings produced by
the grammar like $u$ where $|u|<n$. Now suppose $|w|=n>0$, then the
first production in derivation of $w$ must be $S\to aS$ or $S\to
aSbS$. In either case $S$ symbols will be replaced with strings in
$L(G)$ with length less than $n$, and we know in such strings the
property holds. Then it is easy to show that it must also hold in $w$.
For part two, suppose every string with the property with length less
than $n$ is produced by the grammar, suppose $w$ with length $n>0$ has
the property. Then $w=av,|v|<n$, now if $v$ also has the property then
$w$ can be made with a sequence of derivations starting with $S\to
aS$. If $v$ does not have the property then $v$ must have at least one
$b$, consider the last $b$ from left to right in $w$,then $w=axby$.
Note that the $b$ is the last $b$ and $y$ can be $\epsilon$. Since $w$
has the property, then $axb$ has the property, then $x$ hast the
property and $|x|<n$. Also, as there are no $b$s in $y$, then it has
the property too, so $w$ can be made with a sequence of derivations
starting with $S\to aSbS$, the first $S$ makes $x$ and the second one
makes $y$.
P.S: You should add base case for the inductions, I didn't mention
them.
Best Answer
While it’s true that you will eventually eliminate the production $A\to\lambda$, you can’t just throw it away: by doing so, you’ve changed the language generated by the grammar. The original grammar allows the derivation $$S\Rightarrow aAb\Rightarrow ab\;,$$ while your new one cannot generate the word $ab$ at all. To see this, note that every one of your productions increases the length of the derived string, and the only terminal production is $A\to bb$; it takes at least one other step just to get an $A$, so your grammar cannot possibly produce any word of length less than $3$. (In fact the shortest possible length is $4$, for $abbb$.)
The idea is to adjust all productions that have $A$ on the righthand side in a way that compensates for losing the production $A\to\lambda$. If we have a production $X\to vAw$, where $v$ and $w$ are any strings of terminal and/or non-terminal symbols, the production $A\to\lambda$ allowed the derivation $X\Rightarrow vAw\Rightarrow vw$. If we throw out the production $A\to\lambda$, we lose this possibility, so to compensate we add the production $X\to vw$; this permits the derivation $X\Rightarrow vw$, which has the same effect as the original derivation $X\Rightarrow vAw\Rightarrow vw$ that is no longer available.
In particular, this means that we must add the production $S\to ab$ alongside the existing $S\to aAb$ and the production $B\to ba$ alongside the existing $B\to bAa$.
Taking care of $B\to AA$ is a little trickier, but it’s still not bad if you think in terms of compensating for loss of $A\to\lambda$. In the original grammar we have derivations
$$\begin{align*} &B\Rightarrow AA\Rightarrow A\lambda=A\text{ and}\\ &B\Rightarrow AA\Rightarrow\lambda A=A \end{align*}$$
that are no longer available when we throw out $A\to\lambda$, so we have to add $B\to A$. But we also have
$$B\Rightarrow AA\Rightarrow^*A\Rightarrow\lambda\;,$$
in which we apply the $\lambda$ production to both $A$’s, so we also need to add $B\to\lambda$. At this point we have the following productions:
$$\begin{align*} &S\to aAb\mid ab\mid BBa\\ &A\to bb\\ &B\to AA\mid A\mid\lambda\mid bAa\mid ba\;. \end{align*}$$
We got rid of $A\to\lambda$, but at the cost of introducing a new $\lambda$ production, $B\to\lambda$. That’s okay: we can repeat the process to get rid of $B\to\lambda$. The only production with $B$ on the righthand side is $S\to BBa$. The derivations that we lose by throwing away $B\to\lambda$ are $$S\Rightarrow BBa\Rightarrow Ba$$ and $$S\Rightarrow BBa\Rightarrow Ba\Rightarrow a\;,$$ so we can compensate for the loss of $B\to\lambda$ by adding productions $S\to Ba$ and $S\to a$:
$$\begin{align*} &S\to aAb\mid ab\mid BBa\mid Ba\mid a\\ &A\to bb\\ &B\to AA\mid A\mid bAa\mid ba\;. \end{align*}$$