Differences in book definitions of $\beta$-reduction leading to confusion on when an abstraction needs to be $\alpha$-converted

lambda-calculusprogramming

I've been learning Lambda Calculus in my free time recently to try and learn how to make programming languages & interpreters. I am struggling a little bit with some inconsistencies on different texts on when to use $\alpha$-conversion to avoid capture in $\beta$-reduction.

For example, in Lectures on the Curry-Howard Isomorphism, it makes no mention of any capture restrictions in its definition of $\beta$-reduction.
$$
\label{1}\tag{1}
(\lambda x.P)\, Q \rightarrow_\beta P[x := Q]
$$

However, in Write You A Haskell, it does

The fundamental issue with using locally named binders is the problem of name capture, or how to
handle the case where a substitution conflicts with the names of free variables. We need the condition in
the last case to avoid the naive substitution that would fundamentally alter the meaning of the following
expression when y is rewritten to x.

$\label{2}\tag{2}[y/x](\lambda x.xy) \rightarrow \lambda x.xx$

By convention we will always use a capture-avoiding substitution. Substitution will only proceed if the
variable is not in the set of free variables of the expression, and if it does then a fresh variable will be created in its place.

$\label{3}\tag{3}(\lambda x.e)\,a \rightarrow [x/a]e \quad\text{if}\quad x \notin fv(a)$

Which this piece makes sense to me that if you're substituting a free variable in the body of a lambda expression it can become captured, however the second part is what is throwing me off. The first part appears to be dealing specifically with substituting in lambda abstractions, where the second part dealing with "capture-avoiding substitution" is for any lambda expression.

So I'm not sure I understand why you would need to rename in the following case
$$
\label{4}\tag{4}
(\lambda x. x\ y)(\lambda y. y x) \rightarrow_\beta [x/x](x y)
$$

According to $\ref{3}$, this wouldn't be allowed because $x \in FV(\lambda y. y x)$, but I don't see the problem with it and it seems pretty straightforward to reduce this to $y x$.

When I plug into a lambda calculus interpreter online with steps (such as this one), it agrees and reduces it down to $y x$ with no renaming.

Renaming seems to only apply if the body of the abstraction of the $\beta$-redex is itself another lambda abstraction, such as
$$
\label{5}\tag{5}
(\lambda x.\lambda y. x y)(\lambda x. x y) \rightarrow_\beta [x/(\lambda x.x y)](\lambda y. x y)
$$

which seems to me also according to $\ref{3}$ would not require an alpha conversion because $x\notin FV(\lambda x.x y)$, however when I plug that into the same lambda calculus interpreter it does alpha convert.

One example that I worked out that makes perfect sense to me is the following (using the other notation/definition from Lectures on the Curry-Howard Isomorphism, side note: the differing notations confuse me also):
$$
\label{6}\tag{6}
\begin{align*}
(\lambda y.\lambda x.x y)(\lambda z.x z) &\rightarrow_\beta (\lambda x.x y)[y := (\lambda z. x z)] \\
&\rightarrow_\alpha (\lambda a. (xy)[x := a][y := (\lambda z. x z)]) &\text{if}\quad x\in FV(\lambda z. x z)\,\text{and}\, y\in FV(x y) \\
&\rightarrow (\lambda a.(a y)[y := (\lambda z. x z)] \\
&\rightarrow (\lambda a.a (\lambda z. x z))
\end{align*}
$$

and the condition makes sense because you need to rename $x$ if it's in the free variables of the substitution to avoid capture, and $y$ needs to be in the free variables of the body because you can only substitute free variables.

But then when I go back to the other examples I end up re-confusing myself on when I need to rename because it seems inconsistent.

In total, the question is

  1. What am I doing wrong in the examples that are confusing me using the Write You A Haskell ($\ref{3}$) definition?
  2. Am I correct that the renaming only applies when the $\beta$-redex abstraction body is itself another abstraction?

Best Answer

The rule from Write you a Haskell is incorrect. The correct rule is:

The reduction $(\lambda x . e) a \to_\beta [x/a]e$, where $x$ is a variable and $a$ and $e$ are terms, holds whenever the no free variable of $a$ is a bound variable in $e$. Here, $[x/a] e$ means a simple replacement of all occurences of $x$ in $e$ by $a$.

Whether $x$ is free in $a$ is completely irrelevant. What is relevant is that free occurences of a variable $t$ in the term $a$ do not become "captured" by $\lambda$-abstractions in $e$.

Your first reduction would thus proceed: $(\lambda x . x y) (\lambda y . y x) \to_\beta (\lambda y . y x) y \to_\beta yx$ with no further opportunities for reduction.

The reduction that confused you is that of $(\lambda x . \lambda y . xy) (\lambda x . xy)$. In this case, we see that $y$ is a free variable in $\lambda x . x y$ but $y$ is a bound variable in $\lambda y . xy$. Thus, it is not possible to do a $\beta$ reduction immediately: we must first do an $\alpha$-conversion. $(\lambda x . \lambda y . xy) (\lambda x . xy) \equiv_\alpha (\lambda x . \lambda z . xz) (\lambda x . xy) \to_\beta \lambda z . (\lambda x . xy) z \to_\beta \lambda z . zy$.

If you permitted an immediate $\beta$-reduction, you would get the following:

$(\lambda x . \lambda y . xy) (\lambda x . xy) \to_\beta \lambda y . (\lambda x . xy) y$

Notice that $y$ no longer occurs free. Continuing:

$\lambda y . (\lambda x . xy) y \to_\beta \lambda y . yy$

So we get that $\lambda z . zy \equiv \lambda y . yy$, which is clearly not what we expect intuitively. This is, of course, because we did an invalid $\beta$ reduction.