The Language generated by the grammar

context-free-grammarformal-languages

Describe the language generated by the grammar G with such production rules:

$$\begin{align*}
S&\to a\mid bS\mid cSS\end{align*}$$

After thinking a bit we can get the next result (though it is not very formal):
$$\begin{align*}
S&\to a\mid cSS\mid b^nc^m(S)^{m + 1} \end{align*}$$

for some $$n, m \in \mathbb{N} $$ For every S n and m can be different.
I cannot understand what is this language.

Best Answer

Let $L$ be the language generated by $G$. The only terminal production is $S\to a$, so every word in $L$ must end in $a$. Moreover, every production generates a terminal symbol, so every $b$ in a word of $L$ must be followed by at least one more symbol, and every $c$ in a word of $L$ must be followed by at least two more symbols.

It’s easy to see that the words produced by derivations that do not use the production $S\to cSS$ are precisely the words of the form $b^na$ for $n\ge 0$: $n$ is the number of times the production $S\to bS$ is used before the derivation is terminated by the production $S\to a$. I’ll call words of $L$ of this form basic words.

Now suppose that a derivation uses the production $S\to cSS$ exactly once, after using $S\to bS$ $n_1$ times. At that point we have the derivation

$$S\Rightarrow^{n_1}b^{n_1}S\Rightarrow b^{n_1}cSS\,.$$

Since the derivation uses $S\to cSS$ only once, each of those last two $S$s must produce a basic word, so we’ll end up generating a word $b^{n_1}cb^{n_2}ab^{n_3}a$ for some $n_1,n_2,n_3\ge 0$.

What happens if the derivation uses $S\to cSS$ twice? Depending on whether the second application of $S\to cSS$ derives from the first or the second $S$ in $b^{n_1}cSS$, we can get either $b^{n_1}cb^{n_2}cb^{n_3}ab^{n_4}ab^{n_5}a$ or $b^{n_1}cb^{n_2}ab^{n_3}cb^{n_4}ab^{n_5}a$. If we use $A$ to stand for any basic word and $C$ to stand for any word of the form $b^nc$ with $n\ge 0$, we can represent these as $CCAAA$ and $CACAA$, respectively. In these terms the words generated by derivations that use $S\to cSS$ once are the words of the form $CAA$.

At this point it’s not hard to see that a derivation that uses $S\to cSS$ three times will produce a word of one of the following forms:

$$\begin{align*} &CCCAAAA,CCACAAA,CCAACAA,\\ &CACCAAA,CACACAA \end{align*}$$

These are the forms that can be obtained by replacing one $A$ in $CCAAA$ or $CACAA$ with $CAA$; this corresponds to replacing one use of the production $S\to bS$ with one use of the production $S\to cSS$.

If we think of each word of the form $C$ as a left parenthesis and each word of the form $A$ as a right parenthesis, it appears so far that we are generating strings that consist of a well-formed parenthesis string followed by one extra right parenthesis. Try to prove by induction on the number of uses of the production $S\to cSS$ that this is precisely what we get; that will give you a complete description of $L$.

Related Question