[Tex/LaTex] How does your mind bend \expandafter to its will

best practicesexpansionlearningtex-core

Expansion is often cited as one of the most arcane aspects of TeX, more akin to witchcraft than to something easily picked up by the newcomer. There are many great questions and great answers about expansion on the site but, although I like to think I'm getting better at it, I still find myself stumped by more complicated cases.

enter image description here

For instance, take the LaTeX internal called \in@. Martin Scharrer's excellent List of internal LaTeX2e macros explains \in@ as follows:

\in@{⟨1⟩}{⟨2⟩}
Checks if first argument occurs in the second and sets the switch \ifin@ accordantly. The arguments are not expanded. This must be done beforehand.

Because I'd like to pass once-expandable macros as arguments to \in@, I need to expand each of those arguments once before \in@ gets expanded.

I managed to handle the case where the first argument needs to be expanded once and the second argument doesn't need expansion (see my code below).

However, I can't get my head around the \expandafter juggling required to handle the case in which both arguments need to be expanded once. I think (correct me if I'm wrong) that I have to leave the braces untouched until \in@ itself is processed, and that's what I find difficult. I'd appreciate your help on this particular expansion problem.

\documentclass{article}

\let\ex=\expandafter % <--- more readable \expandafter

\begin{document}
\makeatletter

\in@{foo}{foo,bar,baz} % <--- works as expected
\ifin@ true\else false\fi

\def\foo{foo}
\ex\in@\ex{\foo}{foo,bar,baz} % <--- so far, so good...
\ifin@ true\else false\fi

\def\cslist{foo,bar,baz}
\in@{\foo}{\cslist} % <--- What combination of \expandafter is needed here?
\ifin@ true\else false\fi

\makeatother
\end{document}

However, more generally, I'd like to know what general procedure expansion gurus follow in their head (and possibly on paper) to expand a sequence of tokens in the desired order. Stephan v. Bechtolsheim's A tutorial on \expandafter gave me some insight, but I'm still far from mastering it. I think sharing your tricks and recipes might help me and others improve our expansion skills.

So, what cognitive processes allow you to bend \expandafter to your will?

Note: I insist on using only \expandafter here, and none of that \edef / \noexpand trickery. :)

Best Answer

\in@{\foo}{\cslist} % <--- What combination of \expandafter is needed here?

If \foo is first expanded, then we have the problem, that \expandafter cannot jump over serveral tokens at once, also the number of tokens is not known. Therefore the latest token is expanded first. But at this stage we cannot add the \expandafter, because we have to insert the \expandafter tokens for \foo first:

\ex\in@\ex{\foo}{\cslist}

Then we add the outmost \expandafter chain to expand \cslist. The next line uses \EX for the new \expandafter to make the difference between the stages visible:

\EX\ex\EX\in@\EX\ex\EX{\EX\ex\EX\foo\EX}\EX{\cslist}

Result:

\ex\ex\ex\in@\ex\ex\ex{\ex\ex\ex\foo\ex}\ex{\cslist}

A more generic algorithm would be:

Establish an order of expansions:
- Collapsing expansions: We have to make sure, that the number of tokens is known and we can insert \expandafter between them, if we need to jump over them. Therefore constructs like \csname need to be expanded at an earlier level. This allows also the use of arguments inside \csname with an unknown number of tokens, because the \csname construct becomes one single command token after one expansion step.
  
  Note: See also the trick below, that \csname can be used to expand stuff afterwards.
- Expanding expansions, e.g. \cslist above, on the right side have to come first, because we cannot jump over a unknown number of tokens.
Now we can add \expandafter chains from the start to the token that needs expanding. The order from the previous step is now reversed. First the chain for the token that is last expanded is inserted, e.g.:
```
0. \a\b\last\c\first
1. \EX\a\EX\b\last\c\first % \EX inserted
```
Then we go backwards in time to expand the token that needs expansion before the last:
```
=1. \ex\a\ex\b\last\c\first
 2. \EX\ex\EX\a\EX\ex\EX\b\EX\last\EX\c\first % \EX inserted
=2. \ex\ex\ex\a\ex\ex\ex\b\ex\last\ex\c\first
```

"Tricks"

Sometimes TeX helps to save some \expandafter.

Expanding after \csname:

Let's assume \foo and \cslist are not given explicitly but constructed via \csname:
```
\in@{\csname foo\endcsname}{\csname cslist\endcsname}
```
A naive approach would require four expansions waves:
1. expanding the first \csname to get one token \foo
2. expanding the second \csname to get \cslist
3. expanding \cslist
4. expanding \foo
Result: Start with 2⁴-1 \expandafter (= 15).

This can be reduced: TeX expands the tokens between \csname and \endcsname until nothing expandable is left to form a command sequence. The following uses this to get \cslist and its expansion before \foo is constructed:
```
\csname foo\ex\ex\ex\endcsname
\ex\ex\ex}\ex\ex\ex{\csname cslist\endcsname}
```
And the whole expression with the expansion of \foo:
```
\ex\ex\ex\in@\ex\ex\ex{\csname foo\ex\ex\ex\endcsname
\ex\ex\ex}\ex\ex\ex{\csname cslist\endcsname}
```
The result are 15 \expandafter in total.
Expanding arguments of some TeX primitives such as \uppercase.

Let's assume \foo expands to a word that should be converted to uppercase:
```
\ex\uppercase\ex{\foo}
```
Here we can save the first \expandafter, because \uppercase already expands the next tokens until it gets the opening brace:
```
\uppercase\ex{\foo}
```
Other primitives: \detokenize, \scantokens, \message.

Caveat: If someone redefines \uppercase as macro, this trick will fail obviously.

Related Solutions

[Tex/LaTex] What does \begingroup\expandafter…\endgroup do

Let's look step by step

\begingroup\expandafter\expandafter\expandafter\endgroup
\expandafter\ifx\csname directlua\endcsname\relax
  A
\else
  B
\fi

This becomes

(\begingroup)\expandafter\endgroup
\ifx\directlua\relax
  A
\else
  B
\fi

The \begingroup has already been digested, so I leave it in parentheses just to remember a group has been opened. Another step, now, where we have to distinguish between cases.

Case 1: \directlua is not defined, so the token produced by\csname directlua\endcsname is equivalent to \relax.

(\begingroup)\endgroup A\else B\fi

Now \endgroup is digested and this removes the assignment of the meaning \relax to \directlua. A is examined, the expansion of \else B\fi is empty.

Case 2: \directlua is defined.

(\begingroup)\endgroup B\fi

Again \endgroup is digested, but does not restore anything. The expansion of \fi is empty.

Why not doing this inside a group? The key point is that at the end \directlua is not defined if it wasn't at the start of the process. The same would be true if the code is

\begingroup\expandafter\ifx\csname directlua\endcsname\relax A\else B\fi\endgroup

However the purpose of A and B is doing some assignments. In this case A would probably be \luatexfalse, after having said before \newif\ifluatex, and B would be \luatextrue. The triple \expandafter inside the group dispenses from a global assignment, following the good practice that assignments to a variable should be always global or always local (so long as it's possible). Of course in this case a global assignment would not be that important, in other cases it might have consequences on the save stack.

The suggested alternative

{\expandafter}\expandafter\ifx\csname directlua\endcsname\undefined
  A
\else
  B
\fi

(with \undefined, not \relax) is less attractive, because it relies on a certain token to be undefined. One could object that the code we're analyzing assumes \relax has its primitive meaning, but some assumptions need to be made.

If e-TeX can be assumed, the simpler test

\ifdefined\directlua
  A
\else
  B
\fi

is even fully expandable.

[Tex/LaTex] Why isn’t everything expandable

While a definitive answer can only come from the Stanford team involved in development of TeX, and from Professor Knuth in particular, I think we can see some possible reasons.

First, Knuth designed TeX primarily to solve a particular problem (typesetting The Art of Computer Programming). He made TeX sufficiently powerful to solve the typesetting problems he faced, plus the more general case he decided to address. However, he also kept TeX (almost) as simple as necessary to achieve this. While expandable macros are useful, they are not required to solve many issues.

Secondly, there are cases where an expandable approach would be at least potentially ambiguous. Bruno's \edef\foo{\def\foo{abc}} is a good case. I'd say that here the expected result with an expandable \def is that \foo expands to nothing, but I'd also say this is not totally clear. There is the much more common case where you want something like

\begingroup
\edef\x{%
 \endgroup
 \def\noexpand\foo{\csname some-macro-to-fully-expand\endcsname}%
 }
 \x

which would be made more complex with expandable primitives.

The above example points to another grey area: what would happen about things like \begingroup and more importantly \relax. The fact that the later is a non-expandable no-op is often important in TeX programming. (Indeed, the fact that \numexpr, etc., gobble an optional trailing \relax is sometimes regarded as a bad thing.)

Finally, I suspect that ease of implementation is important. The approach of having separate expansion and execution steps makes the flow relatively easy to understand, and I also suspect to implement. An approach which mixes expansion and execution requires a more complex architecture. Here, we have to remember when Knuth was writing TeX, and the fact that programming ideas which we take for granted today were not necessarily applicable in the late 1970s. A fully-expandable approach would I suspect have made the code more complex and slower. The speed impact is one that was important when TeX was running on 'big' computers.

Best Answer

"Tricks"

Related Solutions

[Tex/LaTex] What does \begingroup\expandafter…\endgroup do

[Tex/LaTex] Why isn’t everything expandable

Related Question