Writing a grammar that creates a specific language from a given grammar in Chomsky normal form

context-free-grammarformal-languages

Given a grammar in Chomsky normal form that creates a language $L$ over alphabet $\Sigma$, that the letter z doesn't belong to $\Sigma$, without the empty string. I need to write a context-free grammar that creates the following language $L^\prime$:

$$L^\prime = \{a_1\mathtt{z}a_2\mathtt{z}a_3\mathtt{z}\ldots a_n\mathtt{z} \mid n \geq 1,\;\wedge\; \forall i, a_i\in \Sigma \; \wedge \; \exists b_1\ldots b_n\in\Sigma :a_1b_1\ldots a_nb_n\in L\}$$

(you can also see this picture for L' definition)

So I think every rule A -> a in the given Chomsky normal form grammar should be changed to A -> az but I am not sure if this is enough.

Best Answer

The language of strings from $L$ where every other symbol has been replaced with $\mathtt{\#}$.

You can do it by having each nonterminal symbol $A$ keep track of two more pieces of information: (1) Will $A$ become an even-length string or odd-length string? (2) Will $A$ start with $\mathtt{z}$ or some other character?

Keeping track of odd or even. Suppose we use subscripts to show whether the symbol will become an odd- or even-length string: $A_0$ will eventually become an even-length string, and $A_1$ will become an odd-length string. Let us modify the original grammar so that it uses these new symbols and respects these parity rules.

In Chomsky normal form (for a language without the empty string $\epsilon$), there are only two sorts of rules: $A\rightarrow BC$, and $A\rightarrow \mathtt{a}$, where $\mathtt{a}\in\Sigma$.

Now, if $A$ becomes a single symbol $\mathtt{a}\in \Sigma$, then $A$ has odd length. Hence whenever the old grammar had $A\rightarrow \mathtt{a}$, a grammar that keeps track of string parity should have $A_1\rightarrow \mathtt{a}$.

As for rules of the form $A\rightarrow BC$, we know that if $A$ has even length, then $B$ and $C$ are both odd, or both even. And if $A$ has odd length, then $B$ and $C$ are odd-and-even, or even-and-odd. Hence wherever the old grammar has $A\rightarrow BC$, a grammar that keeps track of parity should have:

\begin{align*} A_0 &\rightarrow B_0C_0 \;\mid\; B_1C_1\\ A_1 &\rightarrow B_1C_0 \;\mid\; B_0C_1\\ \end{align*}

Finally, we need a start symbol. If $S$ was the old start symbol, we can simply put $S\rightarrow S_0 \mid S_1$ to allow the entire string to be either odd or even, without constraint.

Expanding the grammar to keep track of parity does not change its behavior in any way— it will still produce the same strings as usual. But keeping track of parity will make it easy to keep track of where the $z$'s belong, which is our next task.

Enforcing alternating $z$ symbols.

Now that we keep track of parity (odd or even length), we can easily modify the grammar to replace every other character with $z$.

For example, suppose you know that $A_1$ starts with $z$, and that $A_1\rightarrow B_0C_1$. Then you know that $B_0$ starts with $z$. And because $B_0$ starts with $z$ and has even length, the alternating pattern of $z$'s means that you know $C_1$ must start with $z$ also.

In a similar way, given that you know whether $A$ starts or ends with $z$, the parity of $B$ and $C$ tells you whether $B$ and/or $C$ must also start with $z$. You can reason about all the possible cases.

Putting it all together

Let's use $A^z$ to mean a string that must start with $z$, and $A$ to mean a string that must start with an ordinary character in $\Sigma$.

Altogether, there are now four possibilities: $A_0, \;A_1, \;A_0^z, \;A_1^z$.

Our job is to convert the rules of the grammar to use the new symbols so that the grammar follows our rules about even and odd strings, and puts $\mathtt{z}$ characters where they belong.

  1. Whenever the original grammar has $A\rightarrow \mathtt{a}$, the new grammar should have $A_1\rightarrow \mathtt{a}$ and $A_1^z\rightarrow \mathtt{z}$.

    This says that only an odd-length string (subscript 1) may become a single character. If it has a superscript $z$, it must become the character $\mathtt{z}$. Otherwise, it may become the usual character $\mathtt{a}$.

  2. Assuming strings in the new language must have even length and start with an ordinary symbol in $\Sigma$, the start symbol in the new grammar must be $S_0$.

  3. Whenever the original grammar has $A\rightarrow BC$, the new grammar should have: \begin{align*} A_0 &\rightarrow B_0C_0 \mid B_1C_1^z\\ A_1 &\rightarrow B_1C_0^z \mid B_0C_1\\ A_0^z &\rightarrow B_0^zC_0^z \mid B_1^zC_1\\ A_1^z &\rightarrow B_1^zC_0 \mid B_0^zC_1^z\\ \end{align*}

    Once you know whether $B$ and $C$ are odd or even, and you know whether $A$ starts with $z$ or not, the alternating pattern of regular-character-followed-by-$z$ tells you whether $B$ and $C$ should start with $z$ or not.

    Knowing whether $B$ or $C$ start with $z$ is enough to enforce the alternating pattern of $z$'s throughout the entire string, because you will eventually recursively expand all of these nonterminal symbols to cover every character in the string.

Related Question