[Math] Identify the set of strings of odd length over {a, b} that contain the substring bb

automataregular expressionsregular-language

I need identify the set of possible strings of odd length over $\sum$ (the alphabet) {a, b} that contain the substring bb.

This is new material and a new concept to me and would really appreciate any perspective on how to go about solving this (with an explanation of course).

The only ways I see possible of solving this is through a mathematical proof by induction approach? Or by generating a regular-expression that defines this language (which is also something I feel is a bit ambiguous).

Best Answer

HINT: You could start by finding a regular expression for the strings that contain the substring $bb$, irrespective of their length: the simplest is

$$(a+b)^*bb(a+b)^*\;.\tag{1}$$

Each of the subexpressions $(a+b)^*$ matches the set of all strings over $\{a,b\}$, so $(1)$ matches anything of the form $ubbv$ with $u,v\in\Sigma^*$, i.e., any string that has the substring $bb$ in it somewhere.

But we want only the strings $ubbv$ of odd length. Now $|uv|=|ubbv|-2$, so $ubbv$ has odd length if and only if $uv$ has odd length. And $|uv|=|u|+|v|$, so we want to generate all pairs of strings $u$ and $v$ such that $|u|+|v|$ is odd. That happens precisely when one of $|u|$ and $|v|$ is odd and the other is even.

  • Explain why the regular expression $$(aa+ab+ba+bb)^*\tag{2}$$ matches precisely the strings over $\Sigma$ that have even length.

  • Find a regular expression that matches precisely the strings over $\Sigma$ that have odd length; you may find it helpful to start with $(2)$ and modify or add to it.

  • Use the previous results to get a regular expression that matches precisely the words $ubbv\in\Sigma^*$ such that $|u|$ is odd and $|v|$ is even.

  • Do the same for words $ubbv$ such that $|u|$ is even and $|v|$ is odd.

Now put the pieces together to get a regular expression that matches precisely those words over $\Sigma$ that contain the substring $bb$ and have odd length.