[Math] The Language {xx}, of any string followed by another copy of the same string, is not Context Free. Using Pumping Lemma, through 13 cases.

computer sciencecontext-free-grammarpumping-lemma

I write here the Pumping Lemma for Context Free Languages I refer:

For all context free languages $L$ there exists some $k \in \mathbb{N}
\mid \forall z \in L$
with $\left | z \right | \ge k$, there exist
$uvwxy$ such that:

  1. $z= uvwxy$,
  2. $v$ and $x$ are not both $\epsilon$,
  3. $\left | vwx \right | \le k$, and
  4. $\forall i, uv^iwx^iy \in L$

And this is the Language I want to prove that is not Context Free:
in the following I'll write CF(L)=Context Free (Languages), PL=Pumping Lemma

$L = \left \{ xx \mid x \in \Sigma^* \right \}$, is not CF for any alphabet $\Sigma$ with at least two symbols.

Proof

  1. by contradiction using PL for CFL. Assume that $L$ is CF, so the PL holds for $L$.

  2. Let $k$ be as given by the PL. Choose $z = a^kb^ka^kb^k$, so $z \in L$ and $|z| \ge k$ as required.

  3. Let $u,v,w,x,y$ be as given by the PL, so $uvwxy = a^kb^ka^kb^k$, $v$ and $x$ are not both $\epsilon$, $|vwx| \le k$, and $\forall i$, $uv^iwx^iy \in L$.

  4. Now consider how the substrings $v$ and $x$ fall within the string. Since $|vwx| \le k$, $v$ and $x$ cannot be widely separated, so we have these 13 cases:

lang_xx

For cases 1 through 5, we can choose $i = 0$. Then the string $uv^0wx^0y$ is some $sa^kb^k$ where $|s| < 2k$. In this string the last symbol of the first half is an $a$, while the last symbol of the second half is a $b$. So $uv^0wx^0y \notin L$.

For cases 9 through 13, $i = 0$, the string $uv^0wx^0y$ is some $a^kb^ks$ where $|s| < 2k$. In this string the first symbol of the first half is an $a$, while the first symbol of the second half is $b$. So, again $uv^0wx^0y \notin L$.

For cases 6,7,8, $i=0$, the string $uv^0wx^0y$ is some $a^ksb^k$ where $|s| < 2k$. But such an $a^ksb^k$ can't be $rr$ for any $r$, because if $r$ starts with $k$ as and ends with $k$ bs, we must have $|r| \ge 2k$ and so $|rr| \ge 4k$, while our $|a^ksb^k| < 4k$. So again, $uv^0wx^0y \notin L$.

  1. In every case we have a contradiction, so by contradiction, $L$ is not a CFL. $\Box$

There are some parts I don't understand-
I consider cases from 9 through 13, where I think it is more evident my misunderstanding:

For cases 9 through 13, $i = 0$, the string $uv^0wx^0y$ is some
$a^kb^ks$ where $|s| < 2k$. In this string the first symbol of the
first half is an $a$, while the first symbol of the second half is $b$. So, again $uv^0wx^0y \notin L$.

What is "this string"?

  • Is it $a^kb^ks$?
    If so, considering as first half $a^kb^k$, here the first symbol is $a$; and if the second half is $s$, since $v$ and $x$ must be not both $\epsilon$, $s$ must begin with $a$, but, why it is said $b$? Can you explain me better?

  • Or is it $s$, and divide it in two parts?
    Here, the first symbol of the first half $s$ must be $a$, since $v$ and $x$ aren't both $\epsilon$, and the first symbol of the second half of $s$ is $b$.

(In analog way for cases 1 through 5, so if I understand the above, I understand also those ones).

And also in cases 6,7,8:

For cases 6,7,8, $i=0$, the string $uv^0wx^0y$ is some $a^ksb^k$
where $|s| < 2k$. But such an $a^ksb^k$ can't be $rr$ for any $r$,
because if $r$ starts with $k$ as and ends with $k$ bs, we must
have $|r| \ge 2k$ and so $|rr| \ge 4k$, while our $|a^ksb^k| < 4k$. So
again, $uv^0wx^0y \notin L$.

Here, I don't understand what it is said about $rr$.

Can you explain me better? Many many thanks for your patience, really!! 🙂

Best Answer

For cases 9 through 13 :

"This string" is $uv^0wx^0y=a^kb^ks$. If "this string" is in $L$, it must have even length, so $s$ has even length too, say $2t$ for some integer $t$. The total length of "this string" is then $k+k+2t=2(k+t)$. Since $v$ and $x$ are not both empty, $uvwxy$ has length strictly greater than $uv^0wx^0y$, so "this string" has length $<4k$, and we deduce that $t<k$.

The first half of "this string" is made of the first $k+t$ symbols, so the first half is $a^kb^t$. As you can see, this first half starts with an $a$. The second half of "this string" is made of the last $k+t$ symbols, so the second half is $b^{k-t}s$. As you can see, this second half starts with a $b$.

For cases 6,7,8 : In this case, "this string" is $uv^0wx^0y=a^ksb^k$. As above, $s$ must have even length, say $2t$ for some integer $t$. The total length of "this string" is then $k+k+2t=2(k+t)$. Note that $t<k$. Denote by $s_1,s_2,\ldots,s_{2t}$ the symbols in $s$. The first half of "this string" is made of the first $k+t$ symbols, so the first half is $a^ks_1s_2\ldots s_t$. As you can see, this first half starts with $a^k$. The second half of "this string" is made of the last $k+t$ symbols, so the second half is $s_{t+1}s_{t+2}\ldots s_{2t}b^k$. As you can see, this second half ends with $b^k$. If "this string" is in $L$, the first half and the second half are in fact the same string, called $r$ in the text. Then $r$ must both start with $a^k$ and end with $b^k$, which forces $r$ to have length $\geq 2k$. This means that $uv^0wx^0y=rr$ has length $\geq 4k$. Since $v$ and $x$ are not both empty, $uvwxy$ has length strictly greater than $uv^0wx^0y$, and we deduce $4k>4k$ which is absurd.

Hope this helps !

Related Question