[Math] Difference between regular and not regular languages

regular expressionsregular-language

I don't understand the difference, in classes I've seen the pumping lemma for regular languages and I know how to apply it to demonstrate whether a language is regular or not, but I feel that I don't understand the background meaning of it… For example:

$(0^*1^*)$ is a regular language defined by a regular expression, while the language defined by $\{0^n1^n, n \geq 0\}$ isn't, but in the end, don't they represent the same set of strings?, for me, any string with $0$ or more zeros, followed by $0$ or more ones should belong to both languages.

Best Answer

A regular language is one that can be recognized by a finite machine—that is by a computer with a finite amount of memory.

$\mathtt 0^*\mathtt 1^*$ is the family of strings that have some zeroes followed by some ones. Suppose you had some string a billion symbols long and you had to decide if it had this form. You could do that with hardly any memory: just scan the string from left to right, and keep track of whether you had seen a $\mathtt 1$ yet. If you had, and then you saw a $\mathtt 0$, you could stop and say “Nope! Wrong form!” If you got to the end of the string before that happened, you could say “Yup!”. You would be right every time, and all you would need to remember was “have I seen any ${\mathtt 1}$s yet?”

But $\{\mathtt 0^n\mathtt 1^n\mid n\ge 0\}$ is the family of strings that have some zeroes followed by exactly the same number of ones. If you had some string that was a billion symbols long and you wanted to decide if it had this form, the only way would be to count the number of zeroes and then compare it to the number of ones. And if you had only a finite amount of memory, there would be some number of zeroes that was so big that you wouldn't have enough memory to remember it. If the string was too long, you would lose count, and get the wrong answer.

(The pumping lemma proof that $\{\mathtt 0^n\mathtt 1^n\mid n\ge 0\}$ is not regular is a precise formalization of the claim that, if the string was long enough, you would lose count and get the wrong answer.)

Related Question