[Math] Prove that there exists an equivalent grammar in Chomsky Normal Form like $G’$ such that $G’$ has at most $(K-1)|P|+|T|$ production rules

automatacontext-free-grammar

A context-free grammar (CFG) is a set of recursive rewriting rules (or productions) used to generate patterns of strings. A CFG consists of the following components: a set of terminal symbols, which are the characters of the alphabet that appear in the strings generated by the grammar.

Assume that $G=(V,T,S,P)$ is a context-free grammar without any $\lambda$-productions or unit-productions and for every production rule, $K$ is the maximum number of symbols in the RHS of the rule.

Prove that there exists an equivalent grammar in Chomsky Normal Form like $G'$ such that $G'$ has at most $(K-1)|P|+|T|$ production rules.

Note : I don't know $K$ is related to converting $G$ to $G'$. So, Any clue would be helpful.

Thanks in advance.

Best Answer

HINT: There are two slightly different definitions of Chomsky normal form. It appears that you’re using the one that simply requires each production to be of the form $X\to YZ$ or $X\to a$, rather than the one in Wikipedia; this causes no problems, since the lack of $\lambda$-productions means that $\lambda$ isn’t in the language anyway.

The algorithm given in that article for converting $G$ to $G'$ still works; we’re just in the fortunate situation of being able to ignore parts of it as not applicable. $G$ has no $\lambda$-productions or unit productions, and your definition of Chomsky normal form doesn’t require elimination of the initial symbol from the righthand side (the START step), so the last two steps, DEL and UNIT, don’t arise, and we need only perform the TERM and BIN steps.

  • Show that the TERM step adds at most $|T|$ productions to the grammar.

After the TERM step has been completed, every production is of one of two kinds: $X\to a$ (where $X$ might be one of the new non-terminals introduced in the TERM step), or $X\to Y_1\ldots Y_n$, where $n\ge 2$. Productions of the form $X\to Y_1Y_2$ are fine, but we have to get rid of the ones with $n\ge 3$.

  • Show that the procedure used in the BIN step replaces a production $X\to Y_1\ldots Y_n$ with $n-1$ productions of the form $Z\to UV$.
  • Conclude that $G'$ has at most $|T|+(K-1)|P|$ productions.

If you were using the definition of Chomsky normal form given in the Wikipedia article, the grammar with initial symbol $S$, terminal symbols $a$ and $b$, and productions $S\to Sa\mid bb$ would be a counterexample to the theorem. Clearly $|P|=|T|=K=2$, so $(K-1)|P|+|T|=4$. However, the smallest equivalent grammar in (this version of) Chomsky normal form has six productions:

$$\begin{align*} &S\to BB\mid XA\\ &X\to BB\mid XA\\ &A\to a\\ &B\to b \end{align*}$$

If we allow $S$ to appear on the righthand side, we can reduce this to four, as in the theorem:

$$\begin{align*} &S\to BB\mid SA\\ &A\to a\\ &B\to b \end{align*}$$

Related Question