[Math] Proving Grammar Correctness

context-free-grammarformal-languages

I'm having difficulties in proving the correctness of grammars. The following problem and my pity attempt to solve it show that I may not understand even the basics of such proofs. Very frutstrating….

Could someone point me to the right direction?

The text book used in my class is "Introduction to Automata Theory, Languages, and Computation (2nd Edition) by John Hopcroft". It contains just a very brief description of how to handle such proofs on page 179, in the box "The Form of Proofs About Grammars". He uses a language of palindromes as example, which is much easier than the following problem (IMHO).

Is there any place where I can find examples of grammar proofs?

Thanks! .. and here is the problem:

"Let L be the language on ({a,b}), of all strings with an equal number of 'a' and 'b'.

$S \to \epsilon | aB | bA $
$A \to aS | bAA$
$B \to bS | aBB$

Prove that G generates language L. "

$\forall w, G\overset{*}{\Rightarrow }w \Leftrightarrow |w|_a=|w|_b$

Proving the left side by induction on the number of derivation steps and the right side on the length of w
Basis:
Left side (one derivation step):
$G\overset{*}{\Rightarrow }\epsilon $
$|\epsilon |_a=|\epsilon |_b$

Right Side (length of w is 0):

$|\epsilon |_a=|\epsilon |_b$
$G\overset{*}{\Rightarrow }\epsilon $

Induction:
Right Side (length of w is n+1)

w=sa (a is one symbol and |s|=n)
$|s|_a=|s|_b$ (inductive hypothesis)
$G\overset{*}{\Rightarrow}s\overset{*}{\Rightarrow }sa$

I can't go ant further… 🙁

Best Answer

You have to prove two things:

  1. For each $w\in\{a,b\}^*$, if $G\stackrel{*}\Rightarrow w$, then $|w|_a=|w|_b$.

  2. For each $w\in\{a,b\}^*$, if $|w|_a=|w|_b$, then $G\stackrel{*}\Rightarrow w$.

Sometimes a proof by induction is easier if you prove something stronger than the desired result. I would prove (1) by proving the following stronger

Claim: For $u\in\{S,A,B,a,b\}^*$ let $\alpha(u)=|u|_a+|u|_A$ and $\beta(u)=|u|_b+|u|_B$. Let $$S=u_0 \Rightarrow u_1\Rightarrow u_2\Rightarrow\dots\Rightarrow u_n$$ be a derivation in $G$, where $u_k\in\{S,A,B,a,b\}^*$ for $k=0,\dots,n$. Then $\alpha(u_k)=\beta(u_k)$ for $k=0,\dots,n$. In particular, if $u_n\in\{a,b\}^*$, then $|w|_A=|w|_B=0$, so $$|u_n|_a=\alpha(u_n)=\beta(u_n)=|u_n|_b\;,$$ and therefore $u_n\in L$.

I would prove this by induction on $n$, the length of the derivation:

  1. The only derivation of length $1$ is the one that you already have, $S\Rightarrow\epsilon$, and $\alpha(S)=\beta(S)=\alpha(\epsilon)=\beta(\epsilon)=0$, so we get the induction started just fine.

  2. Now suppose that the Claim is true for all derivations of length at most $n$, and let $$S=u_0 \Rightarrow u_1\Rightarrow u_2\Rightarrow\dots\Rightarrow u_n\Rightarrow u_{n+1}$$ be a derivation of length $n+1$. Then $S=u_0 \Rightarrow u_1\Rightarrow u_2\Rightarrow\dots\Rightarrow u_n$ is a derivation of length $n$, so by the induction hypothesis $\alpha(u_k)=\beta(u_k)$ for $k=0,\dots,n$. To complete the induction step you must show that $\alpha(u_{n+1})=\beta(u_{n+1})$. You do this by investigating how each of the productions of $G$ changes $\alpha(u_n)$ and $\beta(u_n)$.
    Suppose, for instance, that the derivation $u_n\Rightarrow u_{n+1}$ uses the production $A\to bAA$.
    Then $|u_{n+1}|_a=|u_n|_a$, $|u_{n+1}|_A=|u_n|_A+1$, $|u_{n+1}|_b=|u_n|_b+1$, and $|u_{n+1}|_B=|u_n|_B$, so $\alpha(u_{n+1})=\alpha(u_n)+1=\beta(u_n)+1=\beta(u_{n+1})$, as desired.
    I'll leave the other six productions to you.

Added: The proof of (2) - that every word in L has a derivation - is by induction on the length of $w$, which of course must be even if $|w|_a=|w|_b$. You got this started fine: if $|w|=0$, then $w=\epsilon$, and $G\stackrel{*}\Rightarrow\epsilon$.

Now assume that if $|w|<2n$ and $|w|_a=|w|_b$, then $G\stackrel{*}\Rightarrow w$; that's your induction hypothesis. Let $w\in\{a,b\}$ be such that $|w|=2n$ and $|w|_a=|w|_b=n$; you need to prove that $G\stackrel{*}\Rightarrow w$.

There are two cases. The simpler one is when $w$ is the concatenation of two shorter words belonging to $L$; in that case we can simply 'paste together' the derivations of the shorter words. Here's a way to make that precise.

Let $w=s_1s_2\dots s_{2n}$, where each $s_k\in\{a,b\}$. For $k=1,\dots,2n$ let $$m(k)=|s_1\dots s_k|_a-|s_1\dots s_k|_b\;,$$ the excess of $a$'s over $b$'s in the first $k$ letters of $w$. Suppose that $m(k)=0$ for some $k<2n$; then $k$ is even, and the words $u=s_1\dots s_k$ and $v=s_{k+1}\dots s_{2n}$ belong to $L$. (Why?) By the induction hypothesis, therefore, there are derivations $$S=u_0 \Rightarrow u_1\Rightarrow \dots\Rightarrow u_{p-1}\Rightarrow u_p=u\tag{1}$$ and $$S=v_0 \Rightarrow v_1\Rightarrow\dots\Rightarrow v_{q-1}\Rightarrow v_q=v\;.\tag{2}$$ The only terminal production is $S\to\epsilon$, so it must be the case that $u_{p-1}=xSy$, where $x,y\in\{a,b\}^*$ are such that $xy=u$. If $x=u$ and $y=\epsilon$, we can simply paste $(1)$ and $(2)$ together: $$S=u_0 \Rightarrow u_1\Rightarrow \dots\Rightarrow u_{p-1}=uS=uv_0\Rightarrow uv_1\Rightarrow\dots\Rightarrow uv_{q-1}\Rightarrow uv_q=uv=w\;,$$ and therefore $G\stackrel{*}\Rightarrow w$, as desired.

If $y\ne\epsilon$, I claim that the derivation of $w$ can be rearranged so that $u_{p-1}=uS$, and then we can paste the rearranged derivation together with $(2)$ to see that $G\stackrel{*}\Rightarrow w$. Every production except $S\to\epsilon$ leaves a non-terminal symbol on the righthand end of the word, so there must be some $k<p$ such that $u_k=xS$ and $u_{k+1}=x$ for some $x\in\{S,A,B,a,b\}^*$. Then $x=u_{k+1}\stackrel{*}\Rightarrow w$ is a legitimate derivation in $G$, so $S\stackrel{*}\Rightarrow xS\stackrel{*}\Rightarrow wS\Rightarrow w$ is a legitimate derivation of $w$ whose final step allows us to paste it together with $(2)$.

Note that $m(k)$ and $m(k+1)$ always differ by exactly $1$, so if $m(k)$ ever changes sign as $k$ runs from $1$ to $2n$, there must be some $k<2n$ such that $m(k)=0$. Now suppose that $w$ is not the concatenation of two shorter words of $L$; that means that $m(k)\ne 0$ for $1\le k<2n$, so $m(k)$ does not change sign as $k$ runs from $1$ to $2n$. If $s_1=a$, then $m(1)=1$, and $m(k)>0$ for $1\le k<2n$: every proper initial segment of $w$ has more $a$'s than $b$'s. But we know that $m(2n)=0$, since $w\in L$, so it must be the case that $m(2n-1)=1$, and $s_{2n}=b$. Similarly, if $s_1=b$, then $m(1)=-1$, $m(k)<0$ for $1\le k<2n$, every proper initial segment of $w$ has more $b$'s than $a$'s, and $s_{2n}=a$.

In the first case $w=aub$ for some $u\in\{a,b\}^*$, and in the second case $w=bua$; in either case it's clear that $u\in L$. (Why?) Moreover, $|u|<2n$, so by the induction hypothesis $G\stackrel{*}\Rightarrow u$. Can you see how to modify the derivation of $u$ to get a derivation of $w$, and thereby complete the induction step?

Related Question