[Math] DFA – Union operation: How to

automatafinite-automatalogicregular-language

I'm currently looking at deterministic finite automata, and learning how to combine two DFAs using AND or OR. I think I understand how to construct the INTERSECTION (AND) of two DFAs, but I'm at a loss when it comes to constructing the UNION (OR) of the same DFAs.

For instance, assume we have language L defined as follows:

L = {w | w contains the substring ab (AND/OR) ba}

This can be thought of as two sub-languages, respectively:

L1 = {w | w contains the substring ab}
L2 = {w | w contains the substring ba} 

As such, we can construct DFAs for each language (namely L1 and L2), and combine them using either the AND og OR scheme.

AND – Intersection

For intersection (AND), the individual DFAs become as follows:

enter image description here

Furthermore, the intersection M = M(L1) AND M(L2), corresponding to the language

L = {w | w contains the substring ab AND ba}

would be represented by the following DFA:

enter image description here

This should, after what I gather, be correct. If not; please tell me where the faults are.

QUESTION

However, HOW does one construct the UNION of the same DFAs?

I have seen some examples/solutions online, such as the figure below, but I do not understand how they have reached this setup.

enter image description here

Any help on this is extremely appreciated!

Best Answer

The construction of DFAs via (left) derivatives seems not to be as commonly known as it should be; see this paper.

The left derivative of a language $L$ over an alphabet $\Sigma$, with respect to a symbol $a$, is $$D(L,a) = \{w\in\Sigma^*\mid aw\in L\}.$$ For a regex $r$, let $L$ be the language described by $r$; then $D(r,a)$ is a regex matching $D(L,a)$; one can be computed easily.

The states of your DFA are regexes; a regex $R$ transitions to $D(R,a)$ under symbol $a$. The initial state is the given regex, and the accept states are those regexes which match the empty string $\epsilon$. It's convenient to define derivatives with respect to strings via $D(L,aw)=D(D(L,a),w)$.

For the regex $R=\texttt{/.*(ab|ba).*/}$, we get the following: $$\begin{array}{c|c} w& D(R,w)\\ \hline \epsilon& R\\ a & R|\texttt{/b.*/}\\ b & R|\texttt{/a.*/}\\ ab & \texttt{/.*/}\\ ba & \texttt{/.*/}\\ \end{array}$$ As $D(\texttt{/.*/},s)=\texttt{/.*/}$ for any symbol $s$, no further states are created, so this results in a 4-state DFA for $R$, with one accept state corresponding to the language $\texttt{/.*/}$, the only state whose language contains $\epsilon$. This agrees with the picture you provided.

Related Question