[Math] Find a regular expression for binary strings to have odd non-empty blocks of 1s

regular expressionsregular-language

Write down a Regular Expression for the language $L$ consisting of all binary strings where every non-empty block of $1$s has odd length. (Notice that the empty string is in this language.)

My working out:

$$(111)^* 011$$

This way, there will always be an odd number of $1$s.

Best Answer

Let's look at your regular expression:

$$ (111)^* 011 \enspace. $$

It generates the strings $011$, $111011$, $111111011$, and so on. None of these words is such that every nonempty block of $1$s has odd length. Moreover, the total number of $1$s alternates between even and odd, contrary to your claim. Even if we had guaranteed blocks of $1$s of odd lengths, as in

$$ 1(11)^*01(11)^* \enspace, $$

our goal is not to describe some of the words in $L$, but all of them. This is not entirely trivial in this case; therefore let's take another, more systematic approach.

The strings in our language may start with any number of $0$s. So, we are going to use $0^*$ as a building block. A nonempty string of $1$s of odd length is given by $1(11)^*$. That's another building block for our regular expression. From $1(11)^*$ we get $1$, $111$, $11111$, and so on.

Finally, in between blocks of $1$s we can have any positive number of $0$s. That is, $00^*$. Let's now put these building blocks together:

$$ 0^* (1(11)^* 00^*)^* (1(11)^* + \epsilon) \enspace. $$

It's not the simplest regular expression out there, but we should be able to recognize the building blocks. We start with an optional block of $0$s. Then we have zero or more blocks of $1$s of odd lengths, each followed by one or more zeros. Finally, we have an optional block of $1$s (of odd length) not followed by $0$s to account for the fact that a word in $L$ may not have trailing $0$s.

This is not the only regular expression for $L$, as is often the case. If you have already seen Kleene's Theorem, you should know of the systematic procedure alluded to by John Hughes in his comment. For this $L$ it produces a regular expression of complexity comparable to the one above.

Related Solutions

[Math] Regular expression of the set of strings of even length

I'm not sure what rules are available for your definition of regular expression, but following works in Unix (as an extended regular expression): ([lm][lm])*(k[lm]|[lm]k)([lm][lm])*

Edit: (I also removed the unnecessary anchors that Milo Brandt comments on.)

Breaking this down:

[lm] = represents a single character that could be either l or m. In the wikipedia regex page, this would be expressed by (l|m).
( ) * | all mean the same as explained in the Basic Concepts section of the Wikipedia Regular Expressions page.
([lm][lm])* is thus a string of any even number of ls and ms.
(k[lm]|[lm]k) is either k followed by an l or an m, or it is an l or m followed by a k.

So the whole expressions says "a string starting with any number of any of the substrings ll, lm, ml, mm, followed by one of the substrings kl, km, lk, lm, followed by any number of any of the substrings ll, lm, ml, mm.

Every string from $\{m, l, k\}$ of even length with exactly one $k$ in it will match this.

Regular expression for all strings with even length and not containing “ab” and “ba”

We need $7$ non-terminals. $S$ is the initial state, $T,\ U,$ and $V$ all indicate that an odd number of characters have been read, and that the last character read was $a$, $b$, or $c$, respectively. Similarly, $X,\ Y$, and $Z$ indicate that an even number of characters have been read, and that the last character read was $a$, $b$, or $c$, respectively. I use $\lambda$ for the empty string. $S,\ X,\ Y$, and $Z$ are accepting states, so we have the productions $$S\to\lambda\\ X\to\lambda\\ Y\to\lambda\\ Z\to\lambda\\ $$ Also we, have $$S\to aT\\S\to bU\\S\to cV\\T\to aX\\T\to cZ\\U\to bY\\U\to cZ\\V\to aX\\V\to bY\\V\to cZ\\$$

I trust you see how I arrived at these. If we get a $b$ in state $T$ or an $a$ in state $U$ then we've seen a forbidden substring, so there are no productions corresponding to these possibilities.

It remains to determine the productions with $X,\ Y$ or $z$ on the left-hand side. I leave it to you to supply those.

EDIT Reading this over, I see that I've slipped into the language of automata at times. I hope it's obvious what I mean. If not, ping me.

EDIT

When I wrote this, I though you were mainly concerned with whether or not the language is regular, but it's apparent from your comments that you're mostly concerned with getting a regex. Remember that the complement of a regular language is regular. So we want the complement of the union of

-set of strings containing $ab$
-set of strings containing $ba$
-set of strings with an odd number of characters

So, if you can make regexes for these three languages, you can combine them to construct the regex you seek.

EDIT

The previous edit is wrong. The discussion prior to theorem $4.5$ in Hopcroft, Motwani and Ullman says that you can't do it that way. You have to convert the regex to a DFA, construct a DFA that recognizes the complement, and then convert the new DFA to a regex. I think any regex for this language will be exceedingly long and complicated.

Best Answer

Related Solutions

[Math] Regular expression of the set of strings of even length

Regular expression for all strings with even length and not containing “ab” and “ba”

Related Question