Context-Free Grammar – CFG for Language

context-free-grammarformal-languages

I can't find CFG for this language.

$N = \{w ∈ \{0, 1\}^* \mid \text{w contains more } 1\text{s than }0 \text{s}\}$

Basically, it is much easier when order is set (for example, $1$s come after $0$s) but in this situation I got stuck.

I can make a PDA for that but I'm not strong at converting PDA to CFG.

Thanks in advance.

Best Answer

Simple solution

$$S\to E|E1S\\ E \to 0E1E|1E0E|\varepsilon$$

It's clear that anything generated by $E$ will verify $|u|_0 = |u|_1$ and therefore that anything generated by $S$ will verify $|u|_0 \le |u|_1$.

Claim A: $u\in \mathcal L(E)\iff |u|_0 = |u|_1$

Proof A: By induction on $|u|$.

Since $u\in \mathcal L(E)\iff \overline{u}\in \mathcal L(E)$ (where $\overline{u}$ is $u$ with $0$ and $1$ swapped), we can assume WLOG that $u=1v$. Take $w_1$ the greatest prefix of $v$ so that $|w_1|_0=|w_1|_1$. It exists because $\varepsilon$ is one such prefix. Write $v=w_1w_2$ where $w_2$ is just whatever remains. Since we took $w_1$ the greatest with the property, any prefix of $w_2$ has more $0$s than $1s$. In particular, $w_2$ starts with a $0$. So $w_2=0h$. $w_1$ and $h$ both have as many $0$s as $1$s so by the induction hypothesis, $E\to^*w_1$ and $E\to^*h$. So $E\to0E1E\to^*0w_11h=u$ and we're done.

Claim B: $u\in\mathcal L(S)\iff |u|_0\le |u|_1$

Proof B: Left to the reader. One direction come directly from claim A and the other is by induction on $|u|_1-|u|_0$ (and by taking the greatest prefix for which $|u|_0= |u|_1$ to find the part that comes from $E$).


$\Sigma:= \{0,1\}$

If you want to have more $1$s that $0$s, you just have to make sure that whenever you add a $1$ you add a $0$.

$$S \to E|E1T\\ E \to DE|UE|\varepsilon\\ T \to UT|1T|\varepsilon\\ O \to 0 | 1OO\\ I \to 1|0II\\ \\ D \to 0I\\ U \to 1O$$

This is done with $I$ that means "add one more $1$ than $0$s" (except that it only generates words for which this is false for all their prefixes). Similarly $O$ means "add one more $0$ than $1$s". $U$ therefore means "bounce upwards and come back to where you were" and $D$ "bounce downwards and come back to where you were". $E$ means "bounce upwards or downwards and come back to where you were as many times as you want". $T$ means "(Go one step up or bounce upwards and come back to where you were) as many times as you like". With this interpretation, it feels plausible that $S$ generates what we want because either you take $E$ and it means that you'll have equality, or you take $E1T$ and you'll get strict inequality because once you're done with $E$, you go up once and then $T$ doesn't let you go down so you'll be strictly above the $|v|_1-|v|_0=0$ line.

Here is an example of how $|v|_1-|v|_0$ can evolve with the prefix $v\le u$.

111000000110100111111000111001110                                                              

                               /\                                       
  /\                /\    /\  /                                              
 /  \              /  \  /  \/                                               
/____\____________/____\/_______________________________       
      \          /                                                        
       \  /\/\  /                                                         
        \/    \/  
1I----0I----------1O----11O--111O
U-----D-----------U-----1U---11U-
E-----------------------1T-------                                                   

I'll remove $U$ and $D$ from the grammar and will prove it for this one but it should be clear that those two grammars are equivalent.

$$S \to E|E1T\\ E \to 0IE|1OE|\varepsilon\\ T \to 1OT|1T|\varepsilon\\ O \to 0 | 1 OO\\ I \to 1|0II$$


Claim 1: Define $H(u)$:$|u|_0 > |u|_1$ and for any strict prefix $v<u$, $|u|_0 \le |u|_1$. Then for any $u\in\Sigma^*$,$u\in \mathcal L(O) \iff H(u)$. Note that $H(u)$ implies $H'(u)$:"$|u|_0=|u|_1+1$" and that the last letter of $u$ is $0$.

Proof 1:

  • "$\Rightarrow$": Let $P(n)$: "If $O\to^m u\in \Sigma^*$ for some $m\le n$, then H(u)". We prove $P(n)$ by induction on $n\ge 1$.

    • If $O\to^1 u\in \Sigma^*$, then the production must be $O\to 0$ so that $u=0$ which verifies the properties.

    • If $0\to^{n+1} u \in \Sigma^*$ for $n>0$, then there is $v\in (\Sigma\sqcup \Gamma)^*$ so that $I\to v \to^n u$. The first production can only be $O\to 1OO$ so that $v=1OO$. By the fundamental lemma, $u$ can be written as $u=u_1u_2u_3$ so that $1\to^{n_1} u_1$, $O\to^{n_2} u_2$, $O\to^{n_3} u_3$, $n=n_1+n_2+n_3$. Then $H(u)$ follows from $H(u_2)$ and $H(u_3)$ (which are given by the induction hypothesis because $\forall i, n_i \le n$) and $u=0u_2u_3$.

  • "$\Leftarrow$": Let $P(n)$: "For any $u\in \Sigma^*$ so that $H(u)$, $u\in \mathcal L(O)$". We prove $P(n)$ by induction on $n$.

    • $P(0)$ is vacuously true.

    • Suppose $P(n)$. Let $u\in \Sigma^*$ so that $|u|=n+1$ and $H(u)$. Then $u=av$ for some $a\in \Sigma$ and $v\in\Sigma^*$.

    • If $a=0$, then $a<u$ would lead to $1=|0|_0\le |0|_1=0$. So we have $0=a=u$ the derivation $O\to u$ works.

    • If $a=1$, then we take $w_1$ to be the shortest prefix of $v$ so that $H(w_1)$ and we take $w_2$ so that $u=1w_1w_2$ holds. Since $H(u)$ and $H(w_1)$, we also have $H'(u)$ and $H'(w_1)$. This gives us $H'(w_2)$ and so $|w_2|_0>|w_2|_1$. Take $p<w_2$. Then $1w_1p<u$ so that $|w_1|_1+1+|p|_0=|w_1|_0+|p|_0=|1w_1p|_0\le |1w_1p|_1=1+|w_1|_1+|p|_1$ and we indeed have $|p|_0\le |p|_1$.


Claim 2: Define $\overline{H}(u)$:$|u|_1 > |u|_0$ and for any strict prefix $v<u$, $|u|_1 \le |u|_0$. Then for any $u\in\Sigma^*$,$u\in \mathcal L(I) \iff \overline{H}(u)$. Note that $\overline{H}(u)$ implies $\overline{H}'(u)$:"$|u|_1=|u|_0+1$" and that the last letter of $u$ is $1$.

Proof 2: Similar to proof 1.


Claim 3: $u\in \mathcal L(T)\iff$ for any prefix $v\le u$, $|v|_1 \ge |v|_0$.

Proof 3:

  • "$\Rightarrow$": Follows from claim 1 by induction on the size of the derivation.

  • "$\Leftarrow$": Take $u\in\Sigma^*$ so that for any prefix $v\le u$, $|v|_1 \ge |v|_0$. First note that the first letter of $u$ must be $1$ (if $u$ is not $\varepsilon$). We will prove it by induction on $|u|$.

    • If $u=\varepsilon$, we have $T\to u$.

    • If it exists, take $v$ to be the smallest prefix of $u$ so that $|v|_0=|v|_1$. Then $v=1v'$. Since we took the smallest prefix, we have $H(v')$ and we therefore have $O\to^*v'$ by claim 1. Write $u=vw$ for some $w$. Then by the induction hypothesis, $T\to^* w$ and then $T\to^*1OT\to^*1v'w=u$.

    • If it doesn't exist, write $u=1v$. The fact that the prefix doesn't exist implies that $v$ also verifies the property. By the induction hypothesis, we therefore have $T\to^*v$. And we conclude that $T\to 1T\to^*1v=u$.


Claim 4: $u\in\mathcal L(E) \iff |u|_0=|u|_1$

Proof 4: Left to the reader. One implication follows directly from claim 1 and 2 and the other implication is a case analysis on the first letter.


Claim 5: $u\in \mathcal L(S)\iff |u|_0\le |u|_1$

Proof 5: Left to the reader. One implication follows directly from claims 3 and 4 and the other one has two cases: $|u|_0=|u|_1$ or not, and if not $u$ is factored into $u_11u_2$ where $u_1$ is the greatest prefix so that $|u_1|_0=|u_1|_1$.


Bonus claim: The grammar is non-ambiguous.

Bonus proof: Left to the reader. The idea is that the languages generated by $E$ and $E1T$ will be disjoint so you only have to check that each part is non-ambiguous separately. The part of the grammar used by $E$ is $LL(1)$ so it's non-ambiguous. Then for a word $u$, suppose you can write it as $e_11t_1=u=e_21t_2$ with $E\to^*e_i$ and $T\to^*t_i$. $e_i$ is the longest prefix $v<u=e_i1t_i$ so that $|v|_0=|v|_1$ because $|e_i|_0=|e_i|_1$ and for each prefix $w<t_i$, $|w|_0\le|w|1$. This implies that $e_1=e_2$ and therefore that $t_1=t_2$. By a previous argument, there is only one derivation tree for $e_1$ so all that remains to be checked is that there is only one for $t_1$, or in other words that the part of the grammar used by $T$ is non-ambiguous. It is because you can only take $T\to1OT$ if you're going to come back to the level you are at and $T\to1T$ if you're not.