[Math] Best approach to determine the equivalence classes of a formal language

automataequivalence-relationsformal-languages

I created a minimum automaton for a formal language using the Myhill-Nerode theorem. The language for which I created the automaton is defined by

$L=\{w \in \{a,b\}^*:w=av \text{ for a word } v \in\{a,b\}^* \text{ and } |w|_b \text{ is even}\}$

I determined the following four equivalence classes for this language:

$[\epsilon]$ $[a]$ $[b]$ $[ab]$

After that, I used the equivalence classes as the new set of states for the minimized automaton and created the automaton.

For this example, I just drew an automaton that accepts the language and used it as a reference to determine the equivalence classes. But this approach is time-consuming.

What is the best approach to determine the equivalence classes of a formal language, e.g. for the language above? How would you proceed?

Best Answer

In general:

For a general approach I don't know anything better than constructing an automaton, minimizing it, and then recalculating the equivalence classes again. However, there are a few tricks that very often speed up this process, or sometimes even make it unnecessary. Still, this works only because the automatons you usually have to deal with are quite simple, if you were to do this for an automaton with a looooot of states, writing a computer program from scratch that does this would be faster.

Relevant intuition on regular languages:

The regular languages are, intuitively, languages which require only finite amount of information at any given time to decide whether the input belongs or not to the language.

To be more precise, finite amount of information means that there are finitely many variables, each having finitely many possible values. That means, if we take all the possible configurations, we still end up with finitely many states, and exactly that notion is exemplified by the finite automatons: each state corresponds to some configuration.

That is also why the language of parentheses given by grammar: $X \to X\mathtt{(}X\mathtt{)} \mid \varepsilon$ is not regular – to parse it you need to know at any given moment how many parentheses are open, and that number can be arbitrarily big, so even if that is only one number, it is far more than finite amount of information.

However, this intuition is even more apparent when you consider the syntactic monoid of a language. A different characterization of regular languages is that they are recognizable by finite monoids. In particular, if $M$ is the smallest monoid that recognizes language $L$ via some homomorphism $\phi : \Sigma^* \to M$, then that homomorphism $\phi$ is actually the information filter that distills all the necessary info, and leaves nothing more. Another way to think about it is to imagine that each word $w \in \Sigma^*$ contains some information, but not all that information is needed to decide whether $w \in L$. When we map $w$ into $M$ as $\phi(w)$, then what we get is a finite representation of all the information that can be necessary to deal with $w$ in context of $L$.

The basic trick:

The trick I've mentioned at the beginning is to guess with what info it is possible to decide whether $w \in L$. For example in your case

$$L=\Big\{w \in \{\mathtt{a},\mathtt{b}\}^* \ \Big|\ w=\mathtt{a}v \text{ for a word } v \in\{\mathtt{a},\mathtt{b}\}^* \text{ and } |w|_{\mathtt{b}} \text{ is even}\Big\}$$

with have two bits of information:

  • if $w$ starts with $\mathtt{a}$,
  • if $|w|_{\mathtt{b}}$ is even.

This implies that we will need at most $2\cdot2 = 4$ states, and this is what you get. Such guessing requires a bit of insight, for example

$$L_3=\Big\{w \in \{\mathtt{a},\mathtt{b}\}^* \ \Big|\ w=\mathtt{a}v \text{ for a word } v \in\{\mathtt{a},\mathtt{b}\}^* \text{ and } |w|_{\mathtt{b}} \text{ is divisible by } 3\Big\}$$

also seems to have 2 bits of information, but actually we need 6 states. The new condition $|w|_{\mathtt{b}} \text{ is divisible by } 3$ requires three states to handle (remainder $0$, $1$ and $2$), so that when we concatenate two words $w_1$ and $w_2$ we will be able to tell the new value just from values of $w_1$ and $w_2$ instead of looking at the whole words (with only two possible values we wouldn't be able to tell the difference between $|w_1|_{\mathtt{b}} = 1 = |w_2|_{\mathtt{b}}$ and $|w_1|_{\mathtt{b}} = 1, |w_2|_{\mathtt{b}} = 2$).

Some caveats:

Now that we are able to identify all the atoms (compare with atoms in probability), we can generate some equivalence classes, but these may not be minimal. For example consider

$$L_\vee=\Big\{w \in \{\mathtt{a},\mathtt{b}\}^* \ \Big|\ w=\mathtt{a}v \text{ for a word } v \in\{\mathtt{a},\mathtt{b}\}^* \text{ or } |w|_{\mathtt{b}} \text{ is divisible by } 3\Big\}$$

Of course, we can split $\Sigma^*$ into six equivalence classes that suffice to handle $L$: $$\{\text{starts_with_}\mathtt{a},\text{starts_with_}\mathtt{b}\} \times\{\text{rem_}0,\text{rem_}1,\text{rem_}2\},$$ but we need only $4$, the remainders of $|w|_{\mathtt{b}} \bmod 3$ are necessary only if $w$ does not start with $\mathtt{a}$:

$$\{ \text{sw}\mathtt{a}, \text{sw}\mathtt{b}\text{_rem_}0, \text{sw}\mathtt{b}\text{_rem_}1, \text{sw}\mathtt{b}\text{_rem_}2 \}.$$

Closing word:

There is a lot of things to consider and this is not a place for me to dwell on them, however, if you start thinking this way, a lot of structure will "pop out". One nice example to practice on is

$$L_{10} = \Big\{ w \in \{\mathtt{a},\mathtt{b}\} \ \Big|\ \text{the 10-th letter of $w$ from the back is }\mathtt{a}\Big\}.$$

It is easy to construct NFA for $L_{10}$, but DFA will have a lot of states. Yet, if you think about it it makes a lot of sense, because the info the syntactic monoid will want you to remember contains at least $10$ bits, i.e. whether the $i$-th letter from the back is $\mathbb{a}$ or not. In other words, you need a way to represent a 10-bit number, so you have to have at least $2^{10}$ different equivalence classes or states in any deterministic finite automaton that recognizes $L_{10}$.

So that was one basic trick and some hints on how to extend it. It's not magic, but it simplified things for me a lot, perhaps you will like this approach too.

I hope this helps $\ddot\smile$

Related Question