[Tex/LaTex] Tricks to make macros expandable

expansiontex-core

Expandable macros are useful (I find working in the lion's mouth super cool). But they are difficult to write.

Can more experienced users give hints that help achieve expandability?

For example, we cannot use counters since the value cannot be changed when expanding. However, a trick might be to keep the value of the counter as the number of A (say) that we move around while expanding: 5 would be 'stored' as AAAAA, and we can add counters by moving the two lists of A together, etc. Of course, it is not efficient, but it is expandable, after all.

For definiteness, say that I want to define a macro whose argument is delimited like \verb: the first character token determines what the end-character is, so that any of \foo|...|, \foo'...', \foo+...+, etc. are treated identically. Can I do this in an expandable way?

Any other trick is welcome.

Best Answer

The answer to your question about how to write expandable macros doesn't lend itself to a single correct answer, so I'll make this one CW and maybe other people will feel an urge to contribute.

Use TeX's flexible macro argument parsing mechanism whenever possible rather than parsing input character by character (which is not expandable if you use \futurelet).
Separate conditionals into separate macros. For example, if you want to test if an argument token is some particular token, you can use \ifx\foo#1 ...\else ...\fi, but this introduces additional tokens in the input stream. A better way to do this is to use \def\iffoo#1{\ifx#1\foo\expandafter\@firstoftwo\else\expandafter\@secondoftwo\fi} which will not leave any extra tokens to deal with. (Herbert wrote something similar that scooped up all text up to the \fi that was pretty clever, but I think this is clearer.) It also nests well.
It can occasionally be useful to use a CPS.
In several situations, the expansion of a token is the full expansion of its argument. For example, \csname ...\endcsname will expand the ... fully. This can be used to compute a string of character tokens which can be recovered, expandably, using \string:
```
 \expandafter\expandafter\expandafter\stripslash
     \expandafter\string\csname\foo\bar\baz\endcsname
```
This does lose the catcodes as all nonspaces will have catcode 12 and spaces will have catcode 10. In other situations the \romannumeral-`X\foo trick can be used to keep expanding \foo until an unexpandable token is reached. It will swallow a space token though.
Using ε-TeX extensions like \numexpr ...\relax, arithmetic can be performed expandably fairly easily. There is a mismatch between TeX's truncating \divide and ε-TeX's /, but this can be worked around with a trial multiplication and \ifnum.

Related Solutions

[Tex/LaTex] Why isn’t everything expandable

While a definitive answer can only come from the Stanford team involved in development of TeX, and from Professor Knuth in particular, I think we can see some possible reasons.

First, Knuth designed TeX primarily to solve a particular problem (typesetting The Art of Computer Programming). He made TeX sufficiently powerful to solve the typesetting problems he faced, plus the more general case he decided to address. However, he also kept TeX (almost) as simple as necessary to achieve this. While expandable macros are useful, they are not required to solve many issues.

Secondly, there are cases where an expandable approach would be at least potentially ambiguous. Bruno's \edef\foo{\def\foo{abc}} is a good case. I'd say that here the expected result with an expandable \def is that \foo expands to nothing, but I'd also say this is not totally clear. There is the much more common case where you want something like

\begingroup
\edef\x{%
 \endgroup
 \def\noexpand\foo{\csname some-macro-to-fully-expand\endcsname}%
 }
 \x

which would be made more complex with expandable primitives.

The above example points to another grey area: what would happen about things like \begingroup and more importantly \relax. The fact that the later is a non-expandable no-op is often important in TeX programming. (Indeed, the fact that \numexpr, etc., gobble an optional trailing \relax is sometimes regarded as a bad thing.)

Finally, I suspect that ease of implementation is important. The approach of having separate expansion and execution steps makes the flow relatively easy to understand, and I also suspect to implement. An approach which mixes expansion and execution requires a more complex architecture. Here, we have to remember when Knuth was writing TeX, and the fact that programming ideas which we take for granted today were not necessarily applicable in the late 1970s. A fully-expandable approach would I suspect have made the code more complex and slower. The speed impact is one that was important when TeX was running on 'big' computers.

[Tex/LaTex] Advantages and disadvantages of fully expandable macros

I think it is best not to compare the expandable/not expandable distinction to concepts from other languages. The main issues relating to expansion are really particular (some would say peculiar) to the execution model of TeX.

TeX has two main modes of operation. All assignments and boxing operations happen (in "the stomach" in The TeXBook terminology) as non-expandable operations. Macro expansion happens before that but unlike (say) the macro expansion of the C pre-processor, macro expansion and non-expandable operations are necessarily intertwined.

It is probably worth noting that the question as posed is not well defined.

TeX tokens are either expandable or non-expandable but "fully expandable" is a grey area full of traps into which the unwary may fall.

Any token defined by \def (or \newcommand etc) is by definition expandable.

A character token such as a is by definition non-expandable.

\def is a non-expandable token.

So if you define

\def\zza{}
\def\zzb{a}
\def\zzc{\def\zze{}}
\def\zzd{\ifmmode a \else b\fi}

then each of these is expandable, with expansion <nothing> a \def\zze{} \ifmmode a \else b\fi respectively.

However which of these is fully expandable ?

Clearly \zza is. But if the definition of "fully expandable" means may be expanded repeatedly leaving no unexpandable tokens then the only fully expandable tokens will all expand to nothing.

So most preople would class \zzb as fully expandable, even though it expands to a which is not expandable.

So a better (or at least more accurate) term than "fully expandable" is "safe in an expansion-only context". Inside \edef and \write and when TeX is looking for a number or dimension, and a few other places, TeX only does expansion and does not do any assignment or other non-expandable operations.

 \edef\yyb{\zzb}

is of course safe, it is the same as \def\yyb{a}. So \yyb is safe in an expansion-only context.

\edef\yyc{\zzc}

is not safe, it is the same as

\edef\yyc{\def\zze{}}

Now \def doesn't expand but in an expansion-only context the token just stays inert it does not make a definition so TeX then tries to expand \zze which typically is not yet defined so this leads to an error, or if \zze has a definition then this will be expanded which is almost always unwanted behaviour. This is the basic cause of the infamous "fragile command in a moving argument" errors in LaTeX.

So \zzc is not safe in an expansion-only context. If it had been defined by the e-TeX construct

\protected\def\zzc{\def\zze{}}

Then in an expansion-only construct protected tokens are made non-expandable so

\edef\yyc{\zzc}

would then be safe, and the same as \def\yyc{\zzc} So a protected command is safe in an expansion-only context but since this safety comes by making the token temporarily non-expandable it probably isn't accurate to say it is "fully expandable".

\edef\yyd{\zzd}

\edef\yyd{\ifmmode a \else b\fi}

which is

\def\yyd{b}

or \def\yyd{a} if the definition is happening inside $...$ (or an equation display). Similarly it will expand to b at the start of an array cell as the expansion will happen while TeX is expanding looking for \omit (\multicolumn) and so before it has inserted the $ to put the array cell in to math mode. Again a protected definition to limit expansion is what is required here.

So sometimes it is good to make things expandable as it keeps more options open.

\def\testa#1#2#3{%
  \ifnum#1=0
  \def\next{#2}%
  \else
  \def\next{#3}%
  \fi
  \next}

\def\firstoftwo#1#2{#1}
\def\secondoftwo#1#2{#2}

\def\testb#1{%
  \ifnum#1=0
  \expandafter\firstoftwo
  \else
  \expandafter\secondoftwo
  \fi}

both \testa{n}{yes}{no} and \testb{n}{yes}{no} will exectute yes if n is 0 and no otherwise but \testb works by expansion and so is safe in an expansion-only context (if its arguments are safe). The \testa version relies on the internal non-expandable operation of \def\next. (Plain TeX and LaTeX2.09 use many tests using \def\next, LaTeX2e changed them to the expandable form where possible.)

For a numeric test it is easy to use the expandable form, but if you want to test if two "strings" are equal by far the easiest way is to go

\def\testc#1#2{%
  \def\tempa{#1}\def\tempb{#2}%
  \ifx\tempa\tempb
  \expandafter\firstoftwo
  \else
  \expandafter\secondoftwo
  \fi}

but now even though we have used the \expandafter\firstoftwo construct, the test relies on two non-expandable definitions. If you really need to test in an expandable way you can find some questions on this site but any answer is typically full of special conditions and cases where it doesn't work and relies on some kind of slow token by token loop through the two arguments testing if they are equal. In 99% of the cases this complication is just not needed and the non-expandable test is sufficient. If you are trying to define a consistent set of tests (as in the ifthen package \ifthenelse for example, then if you resign yourself to the fact that some tests are necessarily non-expandable then you may choose to make them all non-expandable so they behave in a consistent way.

So the answer is:

It all depends....

Best Answer

Related Solutions

[Tex/LaTex] Why isn’t everything expandable

[Tex/LaTex] Advantages and disadvantages of fully expandable macros

Related Question