Showing that the language of prefixes $\operatorname{pre}\mathcal L$ is regular by using the definition of a regular language

regular expressionsregular-language

A language is regular, if it is generated by a regular expression, meaning the expression consists of the alphabet $\Sigma_{\mathrm{RE}} = \Sigma \cup \{\epsilon, \varnothing, +,\ast,(, )\}$, and is formed only though union, concatenation and concatenation closure, as in

  1. $\mathcal L(\epsilon) = \{\epsilon\}$ is a regular language and $\epsilon$ the corresponding regular expression

  2. $\mathcal L(\varnothing) = \emptyset$ is a regular language and the symbol $\varnothing$ the corresponding regular expression.

  3. For every $\alpha\in\Sigma$, $\mathcal L(\alpha) = \{\alpha\}$ is a regular language and $\alpha$ the corresponding regular expression.

  4. For any two regular expressions $R$ and $S$, $\mathcal L(R + S) = \mathcal L(R) \cup \mathcal L(S)$ is a regular language and $R + S$ the corresponding regular expression.

  5. For any two regular expressions $R$ and $S$, $\mathcal L(R\ast S) = \mathcal L(R) \ast \mathcal L(S)$ is a regular language and $R \ast S$ the corresponding regular expression.

  6. The language $\mathcal L(R^\ast) = \mathcal L(R)^\ast$ is regular and $R^\ast$ the corresponding regular expression.

The language of prefixes is defined as
$$
\operatorname{pre}\mathcal L = \{x \in \operatorname{pre} y \mid y \in \mathcal L\} \,.
$$

If the language $\mathcal L$ is regular, show that $\operatorname{pre}\mathcal L$ is regular, using the above definition of regularity.

My current understanding

It seems to me that I would need to find out the prefixes of each language in the definition above, and then come up with a regular expression that generates them. So here are my thoughts:

  1. Since $\operatorname{pre}\epsilon = \epsilon$, we have $\operatorname{pre}\mathcal L(\epsilon) = \mathcal L(\epsilon) = \{\epsilon\}$

  2. Since $\operatorname{pre}\varnothing = \varnothing$, we have $\operatorname{pre}\mathcal L(\varnothing) = \mathcal L(\varnothing) = \emptyset$

  3. $\operatorname{pre}\mathcal L(\alpha) = \mathcal L(\alpha) = \{\alpha\}$ for all $\alpha\in\Sigma$, as $\operatorname{pre}\alpha = \alpha$.

  4. This is where it gets trickier. The language
    $$
    \operatorname{pre}\mathcal L(R + S)
    = \operatorname{pre}( \mathcal L(R) \cup \mathcal L(S) )
    = \{x \in \operatorname{pre} y \mid y \in \mathcal L(R) \cup \mathcal L(S) \}\,.
    $$

    It looks like the regular expression $\operatorname{pre} R + \operatorname{pre} S$ would cover this part.

  5. Similarly to item 4, we have
    $$
    \operatorname{pre}\mathcal L(RS)
    = \operatorname{pre}( \mathcal L(R) \mathcal L(S) )
    = \{x \in \operatorname{pre} y \mid y \in \mathcal L(R) \mathcal L(S) \}\,,
    $$

    so the regular expression $\operatorname{pre}(RS)$ looks appropriate.

  6. With the Kleene closure,
    $$
    \operatorname{pre}\mathcal L(R^\ast)
    = \operatorname{pre}(\mathcal L(R)^\ast)
    = \{x \in \operatorname{pre} y \mid y \in \mathcal L(R)^\ast\}\,.
    $$

    Here the regular expression $\operatorname{pre} R^\ast$ looks like it might work.

But I guess I still need to prove each of these. The first 3 items were obvious, but how do I show that the regular expressions actually generate the languages described?

Best Answer

I denote the regexp operator $+$ of union as $|$, and I omit the sign of composition $\circ$.

We are performing induction on the regular expression. The base cases are 1. 2. 3., but we get $\def\pre{\rm pre} \pre(\alpha)=\epsilon|\alpha$.

In what follows, we assume that $\pre(R)$ and $\pre(S)$ are already defined.

For 4., set $\pre(R|S):=\pre(R)\, |\, \pre(S)$.

For 5., set $\pre(RS):=\pre(R)\,|\, (R\, \pre(S))$.

For 6., set $\pre(R^*)=(R^*)\,\pre(R)$.


For a specific example, we have \begin{align} \pre\left((\alpha|\beta)^* \, \gamma\right) &= \left(\pre((\alpha|\beta)^*) \, \big|\, (\alpha|\beta)^*\, \pre(\gamma)\right) \\ &=\left((\alpha|\beta)^*\, \pre(\alpha|\beta)\, \big| \, (\alpha|\beta)^*\, (\epsilon|\gamma)\right) \\ &=\left((\alpha|\beta)^*\,( \pre(\alpha)\, |\, \pre(\beta))\, \big| \, (\alpha|\beta)^*\, (\epsilon|\gamma)\right)\\ &=\left((\alpha|\beta)^*\, ((\epsilon|\alpha)\, |\, (\epsilon|\beta))\, \big| \, (\alpha|\beta)^*\, (\epsilon|\gamma)\right) \,. \end{align} (which has nevertheless the same language as $(\alpha|\beta)^*(\epsilon|\gamma)$, so in specific examples the result of the above process might be 'simplified' by a shorter equivalent reg.exp).

Related Question