[Math] What are some properties that the regular languages are not closed under

automatacomputer sciencefinite-automata

In a standard Theory of Computation class, one learns a variety of closure properties of regular languages, including but not limited to: homomorphism, inverse homomorphism, union, complement, intersection, concatenation, kleene star, reversal, etc. 2-way NFA's are even equivalent to one-way DFA's, which brings with it its own host of closure properties. (For example, this implies $\operatorname{Double}(L)=\{x\mid xx \in L\}$ where $L \in \operatorname{Reg}$ is regular).

Of course, regular languages aren't closed under every closure property. One example is "closure under addition of all strings of the form $0^n1^n$," but the fact that this brings us outside the regular languages is trivial and uninteresting. Are there any "natural" closure properties that don't apply to the regular languages?

Also, if we extend the definition of regular languages to include these natural properties or some subset of them, would we get another "natural" family of languages or something uninteresting, like the family of languages consisting of all languages? To clarify, an extension of the definition of regular languages would be something like the smallest subset of the set of all languages closed under the standard closure properties of regular languages along with any new ones we choose.

An example (pointed out by Henning Makholm) would be $\operatorname{Half}(L)=\{xx\mid x\in L\}$. Does extending the family of regular languages to one closed under standard regular closure properties (say, all the ones listed in the first paragraph) along with the Half operator give any meaningful new family of languages? Have questions like these been studied by formal linguists?

EDIT: After further research, it seems that my question is heavily related to the study of "abstract families of languages." However, most articles on this subject are relatively inaccessible to a petty undergraduate like myself, and they don't address the essence of my question in trying catch as many closure properties as possible, so I remain unsatisfied.

Best Answer

The following classes of languages seem relevant to your question.

Context-free languages.

There is a very nice characterization of context-free languages (credited to Wechler) involving concatenation product and left quotients in [1]. More details can be found in this answer.

[1] J. Berstel and L. Boasson, Towards an algebraic theory of context-free languages, Fundam. Inform. 25(3): 217-239 (1996)

Rudimentary languages.

The class of rudimentary languages can be defined in various ways. I just mention here two equivalent possible definitions.

If $\mathcal{L}$ is a class of languages, let $\text{NLin}(\mathcal{L})$ denote the class of languages accepted by some nondeterministic Turing machine in linear time, with an oracle set in $\mathcal{L}$.

Let $\mathcal{L}_0 = \{ \emptyset \}$ be the trivial class of languages and for all $n \geqslant 0$, let $\mathcal{L}_{n+1} = \text{NLin}(\mathcal{L}_n)$. A language is rudimentary if it belongs to the union of the classes $\mathcal{L}_n$ for $n \geqslant 0$.

The following nice characterization is due to Wrathall [2].

Rudimentary languages form the smallest class of languages containing the linear language $\{0^n 1^n \mid n \geq 0\}$ and the regular sets and closed under the Boolean operations, inverses of morphisms and length-preserving morphisms.

[2] C. Wrathall, Rudimentary predicates and relative computation, SIAM J. Comput. 7,2 (1978), 194--209.

Related Question