[Tex/LaTex] Why isn’t everything expandable

expansionmacrostex-core

TeX's macro processor does its work in a process called expansion. For an input stream of tokens, the macro processor repeatedly expands them until non-expandable tokens remain. The resulting stream of non-expandable tokens is passed to the TeX's execution processor. The process of expansion can be viewed as calling a function that expands to its result.

Macros absorb their arguments from the input stream and expand to their replacement text, with arguments in-place. Other types of tokens can be expanded differently: for example, conditionals test their arguments (possibly expanding them, too) and skip the branch for which the condition is false.

But there are also non-trivial tokens that are not expandable: most notably, \def and (not actually a token, but see below) register assignments. This means that they can't be used in a macro to obtain a result through expansion: they will be just passed through untouched.

For example,

\edef\test{\def\a{x}\a}

will fail with ! Undefined control sequence., because \def was not expanded and then \a was examined, which proved to be undefined.

Likewise,

\newcount\count
\edef\test{\count=1}
\showthe\count

will show 0, not 1, because, again, none of \count=1 were expandable.

One can imagine a system where such operations are expandable. More precisely, expanding \def would absorb a control sequence name, parameter text and replacement text from the input stream, define the new macro and expand to nothing. Similarly, an operation named \assign would read a register name and a value from the input, do the assignment and expand to nothing. This can be also extended to \let, \advance etc.

Thus the above examples would now behave differently: in the first \edef, \def would read in \a{x}, define \a and expand to empty text. After this expansion the token list would contain \a, which would then expand to x.

In the second example (let it be

\edef\test{\assign\count1}

now) \assign would set \count to 1, and expand to nothing. In the result \test would be defined to be empty, but the value of \count would have been altered.

This new system would allow to achieve some things in a more straightward manner. For example, the problem of defining a macro expanding to n asterisks could now be solved with

\newcount\c
\def\asterisks#1{%
  \assign\c0
  \loop\ifnum\c<#1
    *%
    \advance\c by 1
  \repeat}

, because (see the definitions of \loop and \iterate) \def, \let and assignments are now expandable. Another substantial consequence would be that many more things could be done in macros whose result is passed as an argument to another macro. Observe how e-TeX's \numexpr and friends are already a considerable step in this direction.

The question is: Why doesn't TeX implement such an approach, leaving instead some important operations non-expandable? What are the shortcomings of this approach and the advantages of TeX's implementation?

One possible reason might be that Knuth wanted macros to act as pure functions, incapable of changing the context they are being expanded in. A similar hint can be found in the TeXbook on the matter:

The expansion of expandable tokens takes place in TeX's "mouth," but
primitive commands (including assignments) are done in TeX's
"stomach." One important consequence of this structure is that it is
impossible to redefine a control sequence or to advance a register
while TeX is expanding the token list of, say, a \message or
\write command; assignment operations are done only when TeX is
building a vertical or horizontal or math list.

Another reason might be that nested and/or recursive macro calls could interfere with each other if they had write access to "external" data available to them.

Note: the question is not about what is permitted and what is not by the architecture of TeX, but about why such architecture was designed in the first place.

Best Answer

While a definitive answer can only come from the Stanford team involved in development of TeX, and from Professor Knuth in particular, I think we can see some possible reasons.

First, Knuth designed TeX primarily to solve a particular problem (typesetting The Art of Computer Programming). He made TeX sufficiently powerful to solve the typesetting problems he faced, plus the more general case he decided to address. However, he also kept TeX (almost) as simple as necessary to achieve this. While expandable macros are useful, they are not required to solve many issues.

Secondly, there are cases where an expandable approach would be at least potentially ambiguous. Bruno's \edef\foo{\def\foo{abc}} is a good case. I'd say that here the expected result with an expandable \def is that \foo expands to nothing, but I'd also say this is not totally clear. There is the much more common case where you want something like

\begingroup
\edef\x{%
 \endgroup
 \def\noexpand\foo{\csname some-macro-to-fully-expand\endcsname}%
 }
 \x

which would be made more complex with expandable primitives.

The above example points to another grey area: what would happen about things like \begingroup and more importantly \relax. The fact that the later is a non-expandable no-op is often important in TeX programming. (Indeed, the fact that \numexpr, etc., gobble an optional trailing \relax is sometimes regarded as a bad thing.)

Finally, I suspect that ease of implementation is important. The approach of having separate expansion and execution steps makes the flow relatively easy to understand, and I also suspect to implement. An approach which mixes expansion and execution requires a more complex architecture. Here, we have to remember when Knuth was writing TeX, and the fact that programming ideas which we take for granted today were not necessarily applicable in the late 1970s. A fully-expandable approach would I suspect have made the code more complex and slower. The speed impact is one that was important when TeX was running on 'big' computers.

Related Solutions

[Tex/LaTex] Expandable full expansion of tokens that preserves catcodes

Did you try using \romannumeral? This is used a lot for this type of thing (see for example the \exp_args:Nf concept in expl3):

\def\fullyexpand#1{\romannumeral - `0#1}

This works because TeX will keep expanding #1 looking for a number, which will always turn out to be negative, so the Roman numeral will vanish. Note that this solution will stop on the first non-expandable token, unlike an \edef which will keep going.

It's possible to build a function which can expand using \romannumeral 'around' unexpandable tokens. For example, the following code will work reasonably well:

\long\def\fullyexpand#1{%
  \csname donothing\fullyexpandauxi{#1}{}%
}
\long\def\fullyexpandauxi#1{%
  \expandafter\fullyexpandauxii\romannumeral -`0#1\fullyexpandend
}
\long\def\fullyexpandauxii#1#2\fullyexpandend#3{%
  \ifx\donothing#2\donothing
    \expandafter\fullyexpandend
  \else
    \expandafter\fullyexpandloop
  \fi
  {#1}{#2}{#3}%
}
\long\def\fullyexpandend#1#2#3{\endcsname#3#1}
\long\def\fullyexpandloop#1#2#3{%
  \fullyexpandauxi{#2}{#3#1}%
}
\def\donothing{}

However, this is not the same as \expanded, for a few reasons. First, my implementation will strip out spaces in the argument (as it does a loop, and TeX will skip spaces). Braces will also get stripped out. A bit of testing also reveals that \romannumeral will expand \protected functions here, whereas \expanded does not. I'd also note that the above code needs some guards adding for a blank (empty or all space) argument, as currently things fail in these cases.

With current release LuaTeX one can use \expanded, which does more-or-less the same as an \edef but is expandable (it doesn't required doubled # tokens also). This primitive will be in TeX Live 2019 pdfTeX/e-pTeX/e-upTeX, and hopefully in XeTeX (yet to be confirmed). As a precursor to this, expl3 has a macro-based emulation, slow but working, which does token-by-token examination and allows 'e-type' expansion.

On the aside, it is possible to use \scantokens expandably, but as you may have found this can be tricky and it is usually necessary to have a (non-expandable) change of \everyeof first. LuaTeX addresses this issue with the \scantextokens primitive, which combines this end-of-file stuff directly into the primitive. Of course, if you are using LuaTeX then the original problem is solvable anyway, since \expanded is available.

[Tex/LaTex] Tricks to make macros expandable

The answer to your question about how to write expandable macros doesn't lend itself to a single correct answer, so I'll make this one CW and maybe other people will feel an urge to contribute.

Use TeX's flexible macro argument parsing mechanism whenever possible rather than parsing input character by character (which is not expandable if you use \futurelet).
Separate conditionals into separate macros. For example, if you want to test if an argument token is some particular token, you can use \ifx\foo#1 ...\else ...\fi, but this introduces additional tokens in the input stream. A better way to do this is to use \def\iffoo#1{\ifx#1\foo\expandafter\@firstoftwo\else\expandafter\@secondoftwo\fi} which will not leave any extra tokens to deal with. (Herbert wrote something similar that scooped up all text up to the \fi that was pretty clever, but I think this is clearer.) It also nests well.
It can occasionally be useful to use a CPS.
In several situations, the expansion of a token is the full expansion of its argument. For example, \csname ...\endcsname will expand the ... fully. This can be used to compute a string of character tokens which can be recovered, expandably, using \string:
```
 \expandafter\expandafter\expandafter\stripslash
     \expandafter\string\csname\foo\bar\baz\endcsname
```
This does lose the catcodes as all nonspaces will have catcode 12 and spaces will have catcode 10. In other situations the \romannumeral-`X\foo trick can be used to keep expanding \foo until an unexpandable token is reached. It will swallow a space token though.
Using ε-TeX extensions like \numexpr ...\relax, arithmetic can be performed expandably fairly easily. There is a mismatch between TeX's truncating \divide and ε-TeX's /, but this can be worked around with a trial multiplication and \ifnum.

Best Answer

Related Solutions

[Tex/LaTex] Expandable full expansion of tokens that preserves catcodes

[Tex/LaTex] Tricks to make macros expandable

Related Question