[Math] Lambda productions in grammar

automatacontext-free-grammarformal-languages

I tried removing the $\lambda$ productions from this grammar:

$S \rightarrow a A b \mid B B a$

$A \rightarrow b b \mid \lambda$

$B \rightarrow A A \mid b A a $

It seems like you just take away the 2nd part of the $A$ production since that's the only relevant here and then the rest of it is terminal one way or the other. But it seems like there's more to it:

$S \rightarrow a A b \mid B B a$

$A \rightarrow b b$

$B \rightarrow A A \mid b A a$

I think I'm on the right track here but not really 100% sure. Some guidance would be helpful.

Best Answer

While it’s true that you will eventually eliminate the production $A\to\lambda$, you can’t just throw it away: by doing so, you’ve changed the language generated by the grammar. The original grammar allows the derivation $$S\Rightarrow aAb\Rightarrow ab\;,$$ while your new one cannot generate the word $ab$ at all. To see this, note that every one of your productions increases the length of the derived string, and the only terminal production is $A\to bb$; it takes at least one other step just to get an $A$, so your grammar cannot possibly produce any word of length less than $3$. (In fact the shortest possible length is $4$, for $abbb$.)

The idea is to adjust all productions that have $A$ on the righthand side in a way that compensates for losing the production $A\to\lambda$. If we have a production $X\to vAw$, where $v$ and $w$ are any strings of terminal and/or non-terminal symbols, the production $A\to\lambda$ allowed the derivation $X\Rightarrow vAw\Rightarrow vw$. If we throw out the production $A\to\lambda$, we lose this possibility, so to compensate we add the production $X\to vw$; this permits the derivation $X\Rightarrow vw$, which has the same effect as the original derivation $X\Rightarrow vAw\Rightarrow vw$ that is no longer available.

In particular, this means that we must add the production $S\to ab$ alongside the existing $S\to aAb$ and the production $B\to ba$ alongside the existing $B\to bAa$.

Taking care of $B\to AA$ is a little trickier, but it’s still not bad if you think in terms of compensating for loss of $A\to\lambda$. In the original grammar we have derivations

$$\begin{align*} &B\Rightarrow AA\Rightarrow A\lambda=A\text{ and}\\ &B\Rightarrow AA\Rightarrow\lambda A=A \end{align*}$$

that are no longer available when we throw out $A\to\lambda$, so we have to add $B\to A$. But we also have

$$B\Rightarrow AA\Rightarrow^*A\Rightarrow\lambda\;,$$

in which we apply the $\lambda$ production to both $A$’s, so we also need to add $B\to\lambda$. At this point we have the following productions:

$$\begin{align*} &S\to aAb\mid ab\mid BBa\\ &A\to bb\\ &B\to AA\mid A\mid\lambda\mid bAa\mid ba\;. \end{align*}$$

We got rid of $A\to\lambda$, but at the cost of introducing a new $\lambda$ production, $B\to\lambda$. That’s okay: we can repeat the process to get rid of $B\to\lambda$. The only production with $B$ on the righthand side is $S\to BBa$. The derivations that we lose by throwing away $B\to\lambda$ are $$S\Rightarrow BBa\Rightarrow Ba$$ and $$S\Rightarrow BBa\Rightarrow Ba\Rightarrow a\;,$$ so we can compensate for the loss of $B\to\lambda$ by adding productions $S\to Ba$ and $S\to a$:

$$\begin{align*} &S\to aAb\mid ab\mid BBa\mid Ba\mid a\\ &A\to bb\\ &B\to AA\mid A\mid bAa\mid ba\;. \end{align*}$$