[Math] Set-theoretic foundations for formal language theory

set-theory

Has anyone ever seen any papers or books including set-theoretic descriptions of formal language theory? Specifically, I'm interested in how one would formalize context-free grammars with sets.

Some of this, I suppose is fairly obvious. For example, strings would use a foundational formalism much like ordered pairs (e.g. Kuratowski's definition or similar) but what about objects like production rules and their semantics?

This isn't really necessary for me to get any actual work done, I just thought it would help me build a better intuition around formal language theory.

Thanks in advance,

  • Anthony

Update:

An example of what I'm looking for would be string concatenation. How would one construct this in pure set theroy? If we consider the strings "a" and "b", and consider their set representations to be { { a } } and { { b }} (as per Kuratowski), then the desired result would be { { a }, { a, b } }. Clearly this cannot be accomplished by a defining string concatenation to be set union but there are obviously numerous ways of doing so. The choice at each point in defining how to do something in formal language theory using pure set theory will, in some cases, determine how other notions can be defined. As a consequence, some definitions may be more complex than others (or less elegant than others, one might say). I'm just curious if anyone has done anything like this before.

Best Answer

One can do this using less technology, too...

Let $\Sigma$ be an alphabet, $N$ a set of non-terminals, and $\Sigma^\*$ and $(\Sigma\cup N)^\*$ the full languages on $\Sigma$ and $\Sigma\cup N$, respectively. A context-free grammar is a finite subset $G\subset N\times(\Sigma\cup N)^\*$. Given one such grammar $G$ there is a relation $\mathord\rightarrow_G\subseteq(\Sigma\cup N)^\*\times(\Sigma\cup N)^\*$ which is the least transitive reflexive relation which contains $G$ (notice that $N\times(\Sigma\cup N)^\*\subseteq (\Sigma\cup N)^\*\times(\Sigma\cup N)^\*$, so this makes sense) and such that $$a\rightarrow_Gb \wedge a'\rightarrow_Gb'\implies ab\rightarrow_Ga'b'.$$ The language generated by $G$ from a non-terminal $n\in N$ is just $L(G, n)=\{w\in \Sigma^\*:n\rightarrow_Gw\}$. This is, in fact, the standard way to do this...

Related Question