Automata – Finding Strings Over {a, b} Without Substring ‘aaa’

automataregular expressionsregular-language

I need to identify the set of all possible strings over $\Sigma = \{a, b\}$ that do not contain the substring $aaa$.

I have becoming more familiar with regular-language and languages explored in Automata Theory, and have been defining a particular language through regular-expressions.

I am having a hard time trying to find the set of strings of $\Sigma^*$ (where $\Sigma = \{a, b\}$) that do not contain the substring aaa.

Any advice on how to generate the regular-expression that will define this language would be helpful.

Best Answer

We can describe the set of valid strings as the strings built from the alphabet $V=\{a,b\}$ which contain at most one or two consecutive $a$'s.

These are strings

  • starting with zero or more $b$'s:$$b^*$$

  • followed by zero or more occurrences of $a$ or $aa$ each followed by one or more $b$'s: $$(ab^+|aab^+)^*$$

  • and terminating with zero, one or two $a$'s $$(\varepsilon|a|aa)$$

We obtain

\begin{align*}b^*(ab^+|aab^+)^*(\varepsilon|a|aa)\tag{1}\end{align*}

$$ $$

Add-on: The regular expression (1) generates all valid words in a unique manner. In such cases we can use it to derive a generating function $$A(z)=\sum_{n=0}^\infty a_n z^n$$ with $a_n$ giving the number of valid words of length $n$.

In order to do so all we need to know is the geometric series expansion since the $star$ operator \begin{align*} a^*=\left(\varepsilon|a|a^2|a^3|\cdots\right)\qquad\text{ translates to }\qquad 1+a+a^2+a^3+\cdots=\frac{1}{1-a} \end{align*}

Accordingly $a^+=aa^*$ translates to $\frac{a}{1-a}$ and alternatives like $(\varepsilon|a|aa)$ can be written as $1+a+a^2$.

We obtain by translating the regular expression in the language of generating functions (and by mixing up somewhat the symbolic to provide some intermediate steps)

\begin{align*} b^*\left(ab^+|aab^+\right)^*(\varepsilon|a|aa) &\longrightarrow \quad \frac{1}{1-z}\left(\left.\frac{z^2}{1-z}\right|\frac{z^3}{1-z}\right)^*\left(1+z+z^2\right)\\ &\longrightarrow \quad \frac{1}{1-z}\left(\frac{z^2+z^3}{1-z}\right)^*(1+z+z^2)\\ &\longrightarrow \quad \frac{1}{1-z}\cdot\frac{1}{1-\frac{z^2+z^3}{1-z}}(1+z+z^2)\\ &\quad\quad=\frac{1+z+z^2}{1-z-z^2-z^3} \end{align*}

We conclude: The number of valid words is given by the generating function $A(z)$ \begin{align*} A(z)&=\frac{1+z+z^2}{1-z-z^2-z^3}\\ &=1+2z+4z^2+7z^3+13z^4+\color{blue}{24}z^5+44z^6+81z^7+\cdots \end{align*}

The expansion was done with the help of Wolfram Alpha. We see that e.g. the number of valid words of length $5$ is $24$.

So, out of $2^5=32$ binary words of length $5$ there are $8$ invalid words marked blue in the table below.

\begin{array}{cccc} \color{blue}{aaaaa}\qquad&ab\color{blue}{aaa}\qquad&ba\color{blue}{aaa}\qquad&bb\color{blue}{aaa}\\ \color{blue}{aaaa}b\qquad&abaab\qquad&b\color{blue}{aaa}b\qquad&bbaab\\ \color{blue}{aaa}ba\qquad&ababa\qquad&baaba\qquad&bbaba\\ \color{blue}{aaa}bb\qquad&ababb\qquad&baabb\qquad&bbabb\\ aabaa\qquad&abbaa\qquad&babaa\qquad&bbbaa\\ aabab\qquad&abbab\qquad&babab\qquad&bbbab\\ aabba\qquad&abbba\qquad&babba\qquad&bbbba\\ aabbb\qquad&abbbb\qquad&babbb\qquad&bbbbb\\ \end{array}