[Math] Grammar into Chomsky Normal Form

automatacontext-free-grammarformal-languages

Convert the following grammar into Chomsky Normal Form (CNF):

S → aS | aAA | bB

A → aA | λ

B → bB | aaB

I think this looks ok, but not sure. Maybe someone can point out where I go wrong:

remove lambda, unit, useless (B is the only useless) and spread productions to max 2 per piece

$S -> a$ $S$ | $a$ $A0$

$A0$ -> $A$ $A$

$A$ -> $a$ $A$ | $a$

Best Answer

What you have isn’t in Chomsky normal form: the productions $S\to aS$, $S\to aA_0$, and $A\to aA$ don’t meet the requirements. After removing the $\lambda$ production you should have

$$\begin{align*} &S\to aS\mid aA\mid a\mid bB\\ &A\to aA\mid a\\ &B\to bB\mid aaB \end{align*}$$

This has no unit productions, but the only productions that are compatible with Chomsky normal form are $S\to a$ and $A\to a$, so the others have to be cleaned up. Introduce new non-terminals $A_0$ and $B_0$ together with productions $A_0\to a$ and $B_0\to b$, and replace $a$ and $b$ in non-terminal productions by $A_0$ and $B_0$, respectively, to get this equivalent grammar:

$$\begin{align*} &S\to A_0S\mid A_0A\mid a\mid B_0B\\ &A\to A_0A\mid a\\ &B\to B_0B\mid A_0A_0B\\ &A_0\to a\\ &B_0\to b \end{align*}$$

This has two productions that are incompatible with Chomsky normal form, $S\to A_0S$ and $B\to A_0A_0B$: the former is bad because the initial symbol appears on the righthand side, and the latter is bad because the righthand side is too long. The first problem is fixed by adding a new non-terminal $S_0$, making it the initial symbol, and adding the production $S_0\to S$. The second is fixed by adding a new non-terminal $X$, replacing the production $B\to A_0A_0B$ with $B\to A_0X$, and adding the production $X\to A_0A_0$. The result is the following grammar in Chomsky normal form:

$$\begin{align*} &S_0\to S\\ &S\to A_0S\mid A_0A\mid a\mid B_0B\\ &A\to A_0A\mid a\\ &B\to B_0B\mid A_0X\\ &X\to A_0B\\ &A_0\to a\\ &B_0\to b \end{align*}$$

Related Question