Showing that the language of prefixes $\operatorname{pre}\mathcal L$ is regular by using the definition of a regular language

A language is regular, if it is generated by a regular expression, meaning the expression consists of the alphabet $\Sigma_{\mathrm{RE}} = \Sigma \cup \{\epsilon, \varnothing, +,\ast,(, )\}$, and is formed only though union, concatenation and concatenation closure, as in

$\mathcal L(\epsilon) = \{\epsilon\}$ is a regular language and $\epsilon$ the corresponding regular expression
$\mathcal L(\varnothing) = \emptyset$ is a regular language and the symbol $\varnothing$ the corresponding regular expression.
For every $\alpha\in\Sigma$, $\mathcal L(\alpha) = \{\alpha\}$ is a regular language and $\alpha$ the corresponding regular expression.
For any two regular expressions $R$ and $S$, $\mathcal L(R + S) = \mathcal L(R) \cup \mathcal L(S)$ is a regular language and $R + S$ the corresponding regular expression.
For any two regular expressions $R$ and $S$, $\mathcal L(R\ast S) = \mathcal L(R) \ast \mathcal L(S)$ is a regular language and $R \ast S$ the corresponding regular expression.
The language $\mathcal L(R^\ast) = \mathcal L(R)^\ast$ is regular and $R^\ast$ the corresponding regular expression.

The language of prefixes is defined as
$$
\operatorname{pre}\mathcal L = \{x \in \operatorname{pre} y \mid y \in \mathcal L\} \,.
$$
If the language $\mathcal L$ is regular, show that $\operatorname{pre}\mathcal L$ is regular, using the above definition of regularity.

My current understanding

It seems to me that I would need to find out the prefixes of each language in the definition above, and then come up with a regular expression that generates them. So here are my thoughts:

Since $\operatorname{pre}\epsilon = \epsilon$, we have $\operatorname{pre}\mathcal L(\epsilon) = \mathcal L(\epsilon) = \{\epsilon\}$
Since $\operatorname{pre}\varnothing = \varnothing$, we have $\operatorname{pre}\mathcal L(\varnothing) = \mathcal L(\varnothing) = \emptyset$
$\operatorname{pre}\mathcal L(\alpha) = \mathcal L(\alpha) = \{\alpha\}$ for all $\alpha\in\Sigma$, as $\operatorname{pre}\alpha = \alpha$.
This is where it gets trickier. The language
$$
\operatorname{pre}\mathcal L(R + S)
= \operatorname{pre}( \mathcal L(R) \cup \mathcal L(S) )
= \{x \in \operatorname{pre} y \mid y \in \mathcal L(R) \cup \mathcal L(S) \}\,.
$$
It looks like the regular expression $\operatorname{pre} R + \operatorname{pre} S$ would cover this part.
Similarly to item 4, we have
$$
\operatorname{pre}\mathcal L(RS)
= \operatorname{pre}( \mathcal L(R) \mathcal L(S) )
= \{x \in \operatorname{pre} y \mid y \in \mathcal L(R) \mathcal L(S) \}\,,
$$
so the regular expression $\operatorname{pre}(RS)$ looks appropriate.
With the Kleene closure,
$$
\operatorname{pre}\mathcal L(R^\ast)
= \operatorname{pre}(\mathcal L(R)^\ast)
= \{x \in \operatorname{pre} y \mid y \in \mathcal L(R)^\ast\}\,.
$$
Here the regular expression $\operatorname{pre} R^\ast$ looks like it might work.

But I guess I still need to prove each of these. The first 3 items were obvious, but how do I show that the regular expressions actually generate the languages described?

Best Answer

I denote the regexp operator $+$ of union as $|$, and I omit the sign of composition $\circ$.

We are performing induction on the regular expression. The base cases are 1. 2. 3., but we get $\def\pre{\rm pre} \pre(\alpha)=\epsilon|\alpha$.

In what follows, we assume that $\pre(R)$ and $\pre(S)$ are already defined.

For 4., set $\pre(R|S):=\pre(R)\, |\, \pre(S)$.

For 5., set $\pre(RS):=\pre(R)\,|\, (R\, \pre(S))$.

For 6., set $\pre(R^*)=(R^*)\,\pre(R)$.

For a specific example, we have \begin{align} \pre\left((\alpha|\beta)^* \, \gamma\right) &= \left(\pre((\alpha|\beta)^*) \, \big|\, (\alpha|\beta)^*\, \pre(\gamma)\right) \\ &=\left((\alpha|\beta)^*\, \pre(\alpha|\beta)\, \big| \, (\alpha|\beta)^*\, (\epsilon|\gamma)\right) \\ &=\left((\alpha|\beta)^*\,( \pre(\alpha)\, |\, \pre(\beta))\, \big| \, (\alpha|\beta)^*\, (\epsilon|\gamma)\right)\\ &=\left((\alpha|\beta)^*\, ((\epsilon|\alpha)\, |\, (\epsilon|\beta))\, \big| \, (\alpha|\beta)^*\, (\epsilon|\gamma)\right) \,. \end{align} (which has nevertheless the same language as $(\alpha|\beta)^*(\epsilon|\gamma)$, so in specific examples the result of the above process might be 'simplified' by a shorter equivalent reg.exp).

My current understanding

Best Answer

Related Solutions

[Math] Proving that $L= \{a^nb^n, n\ge 0\}$ is not a regular language.

Proof that a language is a regular according to another language

Related Question