When investigating regular languages, regular expressions are obviously a useful characterisation, not least because they are amenable to nice inductions. On the other hand ambiguity can get in the way of some proofs.
Every regular language is recognized by an unambiguous context-free grammar (take a deterministic automaton which recognises it, and make a production $R \rightarrow tS$ for every edge $R \stackrel{t}{\rightarrow} S$ in the DFA, and $R \rightarrow \epsilon$ for every accepting state $R$).
On the other hand, the natural "grammar" for a regular language is its regular expression. Can these be made unambiguous?
To be precise, let's define a parse for a regular expression (this is I think a natural definition, but not one I've seen named before).
- $x$ is an $x$-parse of $x$, if $x$ is a symbol or $x=\varepsilon$
- $(y, 0)$ is an $R\cup R'$-parse of $x$, if $y$ is an $R$-parse of $x$
- Similarly, $(y,1)$ is an $R\cup R'$-parse of $x$, if $y$ is an $R'$-parse of $x$
- $(y_1, y_2)$ is an $RR'$-parse for $x_1x_2$, if $y_i$ is an $R$-parse for $x_i$ for $i=1,2$
- $[]$ is an $R^*$-parse for $\varepsilon$
- $[y_1, y_2, \dots, y_n]$ is an $R^*$-parse of $x_1x_2\cdots x_n$, if $y_i$ is an $R$-parse for $x_i$ for $1 \le i \le n$
In short, the parses of a string tell us how a regular expression matches a string if it does.
A regular expression $R$ is unambiguous if, for every $x \in L(R)$, there is only one $R$-parse of $x$.
Given a regular expression, is there an unambiguous regular expression which matches the same language?
Best Answer
There's a standard construction of a regular expression from a DFA: define an expression R(i,j,k) for the language of strings that take state i to state j of the DFA while using intermediate states that belong only to the subset of states from state 1 to state k, as follows.
The first two of these rules obviously give you unambiguous expressions, the third is unambiguous because there's only one way of parsing the string into substrings as described, and the fourth is unambiguous because any given string can only go to a single accepting state. Therefore, the final regular expression constructed in this way is unambiguous.