Prove that if $L$ is regular then $\mbox{Copy}(L)$ is regular

automataformal-languagesregular-language

I've been doing this question for a couple of hours, still stuck.

For any string $w$, $O(w)$ is defined as a string that results from copying all the letters that occur in odd positions of $w$.

For example, $O (a) = aa$, $O (ab) = aab$, $O (acb) = aacbb$, and $O (acbd) = aacbbd$.

The definition of language is as follows. Let $L$ be a language over $\Sigma = \{a,b\}$, define $\mbox{Copy} (L)$ to be the language $\{ O (w) : w \in L \}$. Prove that if $L$ is regular then $\mbox{Copy} (L)$ is regular.

I know there has to be some kind of equivalence shown between DFA of $L$ and NFA of Copy(L). But I can't wrap my head around it.

Best Answer

For these types of questions, it can be useful to first think of an algorithm to determine whether a given string is in the desired language -- with the restriction that you should process one element of the string at a time, and maintain only a finite amount of state at each step.

In this case, let us consider the following rough pseudocode:

dfa = DFA of L;
loop:
  -- at an odd position
  match read() {
    SuccessfulRead(x) -> {
      match read() {
        SuccessfulRead(x) -> { dfa.transition(x); }
        SuccessfulRead(y) where y != x -> { return false; }
        EndOfString -> { return false; }
      }
    }
    EndOfString -> { return dfa.accepts(); }

  -- at an even position
  match read() {
    SuccessfulRead(x) -> { dfa.transition(x); }
    EndOfString -> { return dfa.accepts(); }
  }

  goto loop

From here, we need to see what the state is at each execution of read(). One part of the state is implicitly what line of code the execution is at; the other part is the state of variables at that line of code. So, in this case, if the states of the DSA are $S$, we have three different types of states for the DFA to accept $\operatorname{Copy}(L)$:

  • $OddPosFirstRead(s \in S)$: first read call
  • $OddPosSecondRead(s \in S, x \in \Sigma)$: second read call
  • $EvenPos(s \in S)$: third read call

If your formulation of a deterministic finite automaton does not provide a built-in way to signal errors (as in the early return of false in getting an unequal second letter at an odd position), then you will need to add an $Error$ state which always transitions to itself, and is not an accepting state. (In terms of the pseudocode, this can be viewed as imposing the restriction that the code must read all characters of the string until reaching the end, even if it knows before the end what the result will be. To fix the above pseudocode to meet this requirement, you could add an error: loop which just reads in and throws away characters until reaching the end, then returns false; then change the early return false to goto error.)

In other words, if $L$ is accepted by a DFA with states $S$, then we set \begin{align*} S' := & \{ OddPosFirstRead(s) \mid s \in S \} \sqcup \\ & \{ OddPosSecondRead(s, x) \mid s \in S, x \in \Sigma \} \sqcup \\ & \{ EvenPos(s) \mid s \in S \} \sqcup \\ & \{ Error \}. \end{align*} From here, I will leave it as an exercise to fill in the details of what the initial state is, what the state transitions are, and which states are accepting -- in terms of a given DFA and its initial state, state transitions, and accepting states.

(It might be interesting to note that in many modern programming languages which allow you to write coroutines, the compilers for these languages will internally do much the same sort of translation of the code into a state machine, in order to compile the coroutines into machine code. For example, think of the above code, but with each occurrence of read() replaced with await read().)