[Math] Encode lambda calculus in arithmetic

computabilitylambda-calculus

There is plenty of information about how to encode arithmetic given the lambda calculus. The wikipedia article on Church Encoding seems complete to my inexpert eye. My question is "how about the other direction?" What's a good way to encode the lambda calculus in ordinary real or integer (or even matrix) arithmetic? It would seem that an application of Godel Numbering would do it, but I have been unable to find any writing on Godel Numbering for the lambda calculus specifically. It also seems plausible to encode the lambda calculus in arithmetic in some way more efficient than Godel Numbering; perhaps someone has done that? I might settle for the SKI calculus or some other Turing-complete language that's easier to encode than the lambda calculus.

Best Answer

It's not a question where people are typically much concerned with efficiency. When one contemplates implementing the lambda calculus in practice, the target model is usually not one of arithmetic on a single huge integer at a time, but a more realistic model such as a RAM with bounded word size. And then you would be thinking in terms of pointers and linked data structures (or perhaps even byte sequences) rather than arithmetization.

On the other hand, for purely theoretic considerations, Gödel numbers do the job well enough that an improved scheme is not really a publishable matter.

However, for "recreational" purposes, what you would do is start by selecting some pairing function $f:\mathbb N\times\mathbb N\hookrightarrow\mathbb N$. There are various choices which strike different balances between representation efficiency and how cumbersome they are to encode and decode.

Once you have that, one plan could be to represent lambda terms with de Bruijn indices:

$$ \begin{align} [\![ M N ]\!] &= 3 f( [\![ M ]\!], [\![ N ]\!] ) \\ [\![ \lambda . M ]\!] &= 1+3 [\![ M ]\!] \\ [\![ v^n ]\!] &= 2 + 3n \end{align} $$ (remember that with de Bruin indices, the variable in a lambda binder is implicit, and an occurrence of a variable is represented as number of other binders nested between the binding and bound occurrence of the variable. Here I'm showing that as $v^n$, which means "the variable bound $n$ levels above here" -- some sources, such as the Wikipedia article, just use the number $n$ itself as the syntax for a variable occurrence).

De Bruijn notation is convenient for manipulating lambda terms, because it abstracts away from $\alpha$-conversion, but if you would rather represent lambda terms closer to how it looks written down, you could do $$ \begin{align} [\![ M N ]\!] &= 3 f( [\![ M ]\!], [\![ N ]\!] ) \\ [\![ \lambda x_n . M ]\!] &= 1+3 f( n, [\![ M ]\!] ) \\ [\![ x_n ]\!] &= 2 + 3n \end{align} $$

Alternatively, you could represent a combinator calculus such as $\mathbf{SKI}$ like you propose: $$ \begin{align} [\![ \mathbf I ]\!] &= 0 \\ [\![ \mathbf K ]\!] &= 1 \\ [\![ \mathbf S ]\!] &= 2 \\ [\![ M N ]\!] &= 3 + f( [\![ M ]\!], [\![ N ]\!] ) \end{align} $$

The possibilities are practically endless. And note that all of this makes as much sense for representing logical formulas as it does for lambda terms. The prevalence of prime factorizations in the literature about arithmetization even in logic settings is to a large extent just a matter of inertia and the legacy of Gödel's original work. From a technical standpoint a more structured encoding such as the above ones would work just as well.

One point in favor of using an encoding based on symbol sequences (such as tranditional Gödel numbers) rather than one based on tree structures (like above) is that it -- supposedly -- makes it easier to convince an uninitiated reader that what goes on in the arithmetics is indeed exactly the same as one can do mechanically with symbols on paper. However, that depends on the background of the assumed reader. For example, if one is writing for a computer-science audience, going directly to trees will be very likely to feel more natural to them.

Related Question