I'm currently looking at deterministic finite automata, and learning how to combine two DFAs using AND or OR. I think I understand how to construct the INTERSECTION (AND) of two DFAs, but I'm at a loss when it comes to constructing the UNION (OR) of the same DFAs.
For instance, assume we have language L defined as follows:
L = {w | w contains the substring ab (AND/OR) ba}
This can be thought of as two sub-languages, respectively:
L1 = {w | w contains the substring ab}
L2 = {w | w contains the substring ba}
As such, we can construct DFAs for each language (namely L1 and L2), and combine them using either the AND og OR scheme.
AND – Intersection
For intersection (AND), the individual DFAs become as follows:
Furthermore, the intersection M = M(L1) AND M(L2), corresponding to the language
L = {w | w contains the substring ab AND ba}
would be represented by the following DFA:
This should, after what I gather, be correct. If not; please tell me where the faults are.
QUESTION
However, HOW does one construct the UNION of the same DFAs?
I have seen some examples/solutions online, such as the figure below, but I do not understand how they have reached this setup.
Any help on this is extremely appreciated!
Best Answer
The construction of DFAs via (left) derivatives seems not to be as commonly known as it should be; see this paper.
The left derivative of a language $L$ over an alphabet $\Sigma$, with respect to a symbol $a$, is $$D(L,a) = \{w\in\Sigma^*\mid aw\in L\}.$$ For a regex $r$, let $L$ be the language described by $r$; then $D(r,a)$ is a regex matching $D(L,a)$; one can be computed easily.
The states of your DFA are regexes; a regex $R$ transitions to $D(R,a)$ under symbol $a$. The initial state is the given regex, and the accept states are those regexes which match the empty string $\epsilon$. It's convenient to define derivatives with respect to strings via $D(L,aw)=D(D(L,a),w)$.
For the regex $R=\texttt{/.*(ab|ba).*/}$, we get the following: $$\begin{array}{c|c} w& D(R,w)\\ \hline \epsilon& R\\ a & R|\texttt{/b.*/}\\ b & R|\texttt{/a.*/}\\ ab & \texttt{/.*/}\\ ba & \texttt{/.*/}\\ \end{array}$$ As $D(\texttt{/.*/},s)=\texttt{/.*/}$ for any symbol $s$, no further states are created, so this results in a 4-state DFA for $R$, with one accept state corresponding to the language $\texttt{/.*/}$, the only state whose language contains $\epsilon$. This agrees with the picture you provided.