[Math] Prove that the set of palindromes are not regular languages using the pumping lemma.

computer scienceformal-languages

Firstly I pick a language $xyz$ where $x = \epsilon$, $y = (abb)^{k}$, $z = (bba)^{k}$ where $|y| \ge$ the number of states in the automaton representing my language. Then $xyz = (abb)^k(bba)^k$ is a palindrome.

Now I split $y$ into $uvw$, where $v \ne \epsilon$ contains a state visited more than once. If the language is regular then $xuv^iwz$ is within the language $\forall i \ge 0$.

I choose $i = 2$, then $uv^2w = (abb)^n$ where $n > k$.

Then $xuv^2wz = (abb)^n(bba)^k$ where $n > k$, which is not a palindrome.

Firstly, is my reasoning for this particular proof correct? I'm unsure about how to properly apply the pumping lemma to this problem. And secondly the question asks me to state a proof for palindromes in general, whereas I only attempted to prove it for a single case. Is there an example I can choose which proves the conjecture for all palindromes? I assume not, seeing as I can't think of how that could be represented using regular expressions.

Best Answer

Your proof is incorrect, for instance you only handle a subset of palindromes (and regular languages can have non-regular subsets). I'm not quite sure what you're trying to do there, but for reference here's my version of the proof using the pumping lemma. I hope it opens up valuable insights.

In the proof, I will assume the alphabet to consist of at least two distinct symbols (as palindromes are a regular language if there is only one symbol in the alphabet).

My version of the proof

(using $s^R$ to denote the reverse of $s$)

Let $L = \{s : s = s^R\}$ or the set of all palindromes (strings that are equal to their reverse strings).

If $L$ is regular, there must be a pumping length $p \ge 1$ so that for each $s \in L$, if $|s| \ge p$ we can write $s$ as $xyz$ so these three conditions apply:

  • $y \neq \epsilon$ ($y$ has at least one character/symbol, in other words $|y| \ge 1$)
  • $|xy| \le p$
  • for all $i \in \mathbb{N} \cup \{0\}$, $xy^iz \in L$

Let $s = a^pba^p$. We can see $s$ is a palindrome, so $s \in L$. Also, $|s| = 2p + 1 \ge p$.

Because of the two first conditions, all possible $xyz$ divisions of $s$ are in the following form:

  • $x = a^k$
  • $y = a^m$
  • $z = a^{p-m-k}ba^p$

for some non-negative integers $k, m$ so that $k + m \le p$ and $m \ge 1$.

Now let's demonstrate that $xy^iz \in L$ doesn't hold for some values of $i$.

For example, let $i = 2$. Then $$xy^2z = xyyz = a^ka^ma^ma^{p-m-k}ba^p$$

which can be simplified further as $$a^ma^pba^p = a^{p+m}ba^p$$

The reverse string, $(xy^2z)^R$ is $a^pba^{p+m}$. Because we know that $m \ge 1$, it follows that $$a^pba^{p+m} \neq a^{p+m}ba^p$$ Therefore, $xy^2z \notin L$ under any possible divisions $xyz$ while following the three conditions ruled by the lemma. Hence, $L$ must be non-regular.

Two important notes

  • You can choose any string $s \in L$, as long as $|s| \ge p$. Make life easy for yourself and pick a string that makes the rest of the lemma easy to apply. I picked $a^pba^p$ simply to reduce the complexity of different $xyz$ divisions.
  • For a language to be non-regular, you must prove there are no possible divisions $xyz$ that are possible according to the three criteria. It's not enough to demonstrate that some $xyz$ division fails at the $xy^iz \in L$ phase.
Related Question