Codification of a formal language in set theory.

formal-grammarformal-languagesformal-systemslogicset-theory

Starting with an arbitrary class of sets $\Gamma$, can you generate a free semigroup $\Gamma^*$ over $\Gamma$ with the group operation of concatenation ($\frown$)?

The goal here is to codify a formal language in terms of set theory.


The difficulty is in coming up with a set-theoretic operation that corresponds to concatenation such that it makes every new element resulting from concatenation unique, and is associative.

Given $a,b \in \Gamma$, the first thought would be to represent $a \frown b\frown c$ as a 3-tuple $<a,b,c>$. I know I can define tuples set-theoretically via $<a,b>:=\{\{a\},\{a,b\}\}$ but this will violate associativity in concatenation:

$$a \frown(b \frown c)=<a,<b,c>> \ne <<a,b>,c>=(a \frown b)\frown c$$


I have tried other variants but I haven't been able to come up with a set-theoretic description of concatenation that respects associativity, any ideas?

EDIT: This is a related question: https://mathoverflow.net/questions/12190/set-theoretic-foundations-for-formal-language-theory

unfortunately none of the answers provide an explicit definition of concatenation in set-theoretic terms.

Best Answer

There are a couple ways to address this. Equivalence classes give the most algebraically natural treatment:

  • Taking the naive definition of concatenation-as-ordered-pair, we get the free magma $\hat{\Gamma}$ on $\Gamma$.

  • Now the free semigroup will just be the free magma modulo the "associativity relation" - basically, we just need to whip up the binary relation $\sim$ describing when two elements of $\hat{\Gamma}$ "should be" equal. It's a bit messy to describe $\sim$ "explicitly" - this winds up being an inductive construction - but we can also define it as the smallest equivalence relation on $\hat{\Gamma}$ such that (or, the intersection of all equivalence relations on $\hat{\Gamma}$ such that) for all $a,b,c,d\in\hat{\Gamma}$ we have:

    • $a\sim b$ and $c\sim d$ implies $\langle a,c\rangle\sim\langle b,d\rangle$, and

    • $\langle a,\langle b,c\rangle\rangle\sim \langle\langle a,b\rangle, c\rangle$.

It's now easy to define the semigroup operation $\cdot$ as $$[a]_\sim\cdot[b]_\sim=[\langle a,b\rangle_\sim$$ (after, of course, checking that this is in fact well-defined). Or if you want to be really pedantic about it, given $\sim$-classes $E,F$, their product $E\cdot F$ is the unique $\sim$-class $G$ such that there are elements $a\in E$ and $b\in F$ such that $\langle a,b\rangle\in G$.


Another approach, less algebraically natural but perhaps more concrete, is via tuples as functions.

Specifically:

  • An element of $\Gamma^*$ will be a function $f$ such that $(1)$ the domain of $f$ is some natural number $n$, and $(2)$ the range of $f$ is $\subseteq\Gamma$. A function with domain $n$ is "morally" an $n$-tuple.

  • We can now define a fully associative version of concatenation using arithmetic. Specifically, suppose $f,g\in \Gamma^*$ with domains $m,n$ respectively. We let $f^\smallfrown g$ be the function $h$ given by: $dom(h)=m+n$, for $k<m$ we have $h(k)=f(k)$, and for $m\le k<n$ we have $h(k)=g(k-m)$.

    • So for example if $dom(f)=2, dom(g)=1$, $f$ sends $0$ to $0$ and $1$ to $1$, and $g$ sends $0$ to $3$, then $f^\smallfrown g$ has domain $3$, sends $0$ to $0$, sends $1$ to $1$, and sends $2$ to $3$.

(Remember that in set theory a natural number is just a finite ordinal, and in particular is just the set of smaller natural number; so e.g. "$dom(f)=5$" makes perfect sense.)

The only thing this relies on is arithmetic of finite ordinals, which is straightforward to develop.

Related Question