[Tex/LaTex] Why isn’t everything expandable

expansionmacrostex-core

TeX's macro processor does its work in a process called expansion. For an input stream of tokens, the macro processor repeatedly expands them until non-expandable tokens remain. The resulting stream of non-expandable tokens is passed to the TeX's execution processor. The process of expansion can be viewed as calling a function that expands to its result.

Macros absorb their arguments from the input stream and expand to their replacement text, with arguments in-place. Other types of tokens can be expanded differently: for example, conditionals test their arguments (possibly expanding them, too) and skip the branch for which the condition is false.

But there are also non-trivial tokens that are not expandable: most notably, \def and (not actually a token, but see below) register assignments. This means that they can't be used in a macro to obtain a result through expansion: they will be just passed through untouched.

For example,

\edef\test{\def\a{x}\a}

will fail with ! Undefined control sequence., because \def was not expanded and then \a was examined, which proved to be undefined.

Likewise,

\newcount\count
\edef\test{\count=1}
\showthe\count

will show 0, not 1, because, again, none of \count=1 were expandable.

One can imagine a system where such operations are expandable. More precisely, expanding \def would absorb a control sequence name, parameter text and replacement text from the input stream, define the new macro and expand to nothing. Similarly, an operation named \assign would read a register name and a value from the input, do the assignment and expand to nothing. This can be also extended to \let, \advance etc.

Thus the above examples would now behave differently: in the first \edef, \def would read in \a{x}, define \a and expand to empty text. After this expansion the token list would contain \a, which would then expand to x.

In the second example (let it be

\edef\test{\assign\count1}

now) \assign would set \count to 1, and expand to nothing. In the result \test would be defined to be empty, but the value of \count would have been altered.

This new system would allow to achieve some things in a more straightward manner. For example, the problem of defining a macro expanding to n asterisks could now be solved with

\newcount\c
\def\asterisks#1{%
  \assign\c0
  \loop\ifnum\c<#1
    *%
    \advance\c by 1
  \repeat}

, because (see the definitions of \loop and \iterate) \def, \let and assignments are now expandable. Another substantial consequence would be that many more things could be done in macros whose result is passed as an argument to another macro. Observe how e-TeX's \numexpr and friends are already a considerable step in this direction.

The question is: Why doesn't TeX implement such an approach, leaving instead some important operations non-expandable? What are the shortcomings of this approach and the advantages of TeX's implementation?

One possible reason might be that Knuth wanted macros to act as pure functions, incapable of changing the context they are being expanded in. A similar hint can be found in the TeXbook on the matter:

The expansion of expandable tokens takes place in TeX's "mouth," but
primitive commands (including assignments) are done in TeX's
"stomach." One important consequence of this structure is that it is
impossible to redefine a control sequence or to advance a register
while TeX is expanding the token list of, say, a \message or
\write command; assignment operations are done only when TeX is
building a vertical or horizontal or math list.

Another reason might be that nested and/or recursive macro calls could interfere with each other if they had write access to "external" data available to them.

Note: the question is not about what is permitted and what is not by the architecture of TeX, but about why such architecture was designed in the first place.

Best Answer

While a definitive answer can only come from the Stanford team involved in development of TeX, and from Professor Knuth in particular, I think we can see some possible reasons.

First, Knuth designed TeX primarily to solve a particular problem (typesetting The Art of Computer Programming). He made TeX sufficiently powerful to solve the typesetting problems he faced, plus the more general case he decided to address. However, he also kept TeX (almost) as simple as necessary to achieve this. While expandable macros are useful, they are not required to solve many issues.

Secondly, there are cases where an expandable approach would be at least potentially ambiguous. Bruno's \edef\foo{\def\foo{abc}} is a good case. I'd say that here the expected result with an expandable \def is that \foo expands to nothing, but I'd also say this is not totally clear. There is the much more common case where you want something like

\begingroup
\edef\x{%
 \endgroup
 \def\noexpand\foo{\csname some-macro-to-fully-expand\endcsname}%
 }
 \x

which would be made more complex with expandable primitives.

The above example points to another grey area: what would happen about things like \begingroup and more importantly \relax. The fact that the later is a non-expandable no-op is often important in TeX programming. (Indeed, the fact that \numexpr, etc., gobble an optional trailing \relax is sometimes regarded as a bad thing.)

Finally, I suspect that ease of implementation is important. The approach of having separate expansion and execution steps makes the flow relatively easy to understand, and I also suspect to implement. An approach which mixes expansion and execution requires a more complex architecture. Here, we have to remember when Knuth was writing TeX, and the fact that programming ideas which we take for granted today were not necessarily applicable in the late 1970s. A fully-expandable approach would I suspect have made the code more complex and slower. The speed impact is one that was important when TeX was running on 'big' computers.