[Math] Is the language “substrings of an even-lengthed regular language” also regular

automataformal-languagesregular-language

I want to prove that for a regular language $L$ where $\forall w \in L$ the length of $w$ is even, the language containing the first halves of the words of $L$ and the language containing the second halves are both regular.

That is, for the language containing only aabbab, the language containing aab is regular (as is the one containing bab).

I know that concatenation of two regular languages yields a regular language, and that the reverse is not always true. But with this restriction on length, I feel unequipped to answer this question for sure.

Best Answer

For any language $L$, let $L_{\frac12-}$ be the set of first halves of the words in $L$ and $L_{-\frac12}$ be the set of second halves of the words in $L$. Also let $L^R$ be the set of reversals of words in $L$.

If $L$ is regular then so is $L^R$. Since $L_{-\frac12} = ((L^R)_{\frac12-})^R$ it is sufficient to show that if $L$ is regular then so is $L_{\frac12-}$.

Let $M = (Q,\Sigma,\delta,q_0,F)$ be a DFA such that $L(M) = L$.

Construct a DFA $N$ as follows:

  1. The set of states of $N$ is $Q \times P(Q)$ where $P(Q)$ is the power set of $Q$.
  2. The start state of $N$ is $(q_0,F)$.
  3. The set of accept states of $N$ is $\{(q,S) \in Q \times P(Q) \mid q \in S\}.$
  4. Define a function $\gamma : P(Q) \to P(Q)$ by $$\gamma(S) = \{q \in Q \mid {\rm there\ exists\ } a \in \Sigma {\rm\ such\ that\ } \delta(q,a) \in S\}.$$ The transition function $\bar\delta$ of $N$ is defined by $\bar\delta((q,S),a) = (\delta(q,a), \gamma(S)).$

Then $L(N) = L_{\frac12-}$.

Proof

Suppose $(q_0,S_0),\ldots,(q_n,S_n)$ is the path taken by $N$ on input $w = w_1\cdots w_n$, and $(q_n,S_n)$ is an accept state. Then $q_0,\ldots,q_n$ is the path taken by $M$ on input $w$ and $q_n \in S_n$.

From the definition of $\bar\delta$, there exist $q_{n+1}, \ldots q_{2n} \in Q$ and $v_1,\ldots,v_n \in \Sigma$ such that $q_{n+i} = \delta(q_{n+i-1}, v_i) \in S_{n-i}$, so $q_0,\ldots,q_n,q_{n+1},\ldots q_{2n}$ is the path taken by $M$ on input $w_1\cdots w_nv_1 \cdots v_n$. Also $q_{2n} \in S_0 = F$ so $w$ is the first half of a word accepted by $M$.

Conversely suppose $wv \in L(M)$ and $|w|=|v|=n$. Let $q_0,\ldots,q_{2n}$ be the path taken by $M$ on input $wv$, and let $(q_0,S_0), \ldots, (q_n,S_n)$ be the path taken by $N$ on input $w$. Then $q_{2n} \in F = S_0$ and so it follows by induction on $i$ that $q_{2n-i} \in S_i$ for $0 \leq i \leq n$. In particular, $q_n \in S_n$ so $N$ accpets $w$.