[Tex/LaTex] Expansion in \numexpr…\relax versus \pdfstrcmp

e-texexpansionpdfstrcmppdftextex-core

The \numexpr...\relax construction in eTeX allows to evaluate numerical expressions, and it expands tokens fully as it goes.

The \pdfstrcmp{...}{...} construction in pdfTeX lets us compare two lists of tokens after full expansion and conversion to a string (with \detokenize).

Are there specific token lists (parameter-less macros) \foo such that \the\numexpr\foo\relax correctly produces an integer, but \pdfstrcmp{\foo}{} causes a TeX error? It seems that the expansion behaviour is the same in both cases, but one converts its argument to an integer, and the other one to a string.

Best Answer

I see two cases where \the\numexpr...\relax works, but \pdfstrcmp{}{...} will blow up, excluding the obvious case where ... is replaced by 0\relax\undefined, terminating the \numexpr prematurely.

TeX interprets `\a as a number, without expanding \a. Hence, \the\numexpr`\a\relax expands to 97 (the character code of a), whereas \pdfstrcmp{}{`\a} blows up if \a is not defined.
Using \protected control sequences can also cause trouble, because those are forcefully expanded "from the left" in a \numexpr, but will not be expanded by \pdfstrcmp. Take for instance
```
\protected\def\gob#1{}
\the\numexpr 0\gob\undefined  \relax
\pdfstrcmp{}{0\gob\undefined}
```

In the case of \numexpr, \gob is expanded and removes the \undefined control sequence. In the second case, however, the \edef-like expansion leaves the \protected control sequence \gob untouched, and goes on to expand \undefined, which is, well, undefined.

The original goal I had was to define a macro which takes in an argument which can be either empty or an integer expression, and evaluates the integer expression or puts a default value in the case of an empty argument. It seemed illogical to perform expansion in the \numexpr case but not for the emptyness test, and I was thinking of testing with \pdfstrcmp{}{...}. That can't work. An uglier but more correct choice is the following:

\catcode`@=11
\def\evaluate#1{\expandafter\evaluate@\the\numexpr#1\z@\z@\relax}
\def\evaluate@#1\z@#2\relax{#1}

\evaluate{1+2+3}
\evaluate{\empty}
\evaluate{\@gobble\a}
\evaluate{`\a}

If the argument to \evaluate is empty or expands to an empty argument, the \numexpr expansion will go through all of it and reach the first \z@, evaluating that to 0 (default value), then stop because \z@ does not make sense in an integer expression there. The auxiliary cleans up.

On the other hand, if the argument to \evaluate is a correct integer expression, it is evaluated, and \numexpr stops expanding when encountering the first \z@, and the cleaning up macro removes both \z@.

I just thought of a better way: "f-expand" (expand fully from the left, stopping at the first non-expandable token, removing it in case it is a space) the argument before testing for emptyness:

\def\evaluate#1{\expandafter\evaluate@\expandafter{\romannumeral-`0#1}}
\def\evaluate@#1{\the\numexpr\ifcat X\detokenize{#1}X\z@\fi#1\relax}

If the argument is empty or will expand to become empty, \romannumeral-`0#1 expands to nothing, and the test in \evaluate@ is true, which means we insert \z@ (default value). Otherwise #1 is evaluated.

Related Solutions

[Tex/LaTex] Why does \dimexpr swallow \relax

To add to what Hendrik says, I think the overall point was that \numexpr, \dimexpr, etc. can be used in a full expansion context without leaving a stray \relax or space:

\edef\example{\the\dimexpr 10 pt + 20 pt \relax}

gives \example defined as 30pt with no unexpected tokens. That is in many ways much 'neater' than the alternative of leaving the \relax in place. The same argument does not apply to TeX's setting of registers as that is never expandable, so the issue does not arise.

(Of course, for a definitive answer you would need to ask the members of the NTS team who actually wrote this code.)

[Tex/LaTex] \pdfstrcmp or \strcmp in pure TeX

The short answer is 'no'. The longer version is that this primitive (almost certainly) cannot be implemented in macros: primitives rarely can be. In particular, \pdfstrcmp can do an expandable comparison of two sets of tokens on a character ('string') basis, without loosing any spaces. In the past, the LaTeX3 team did have some code which attempted to do the same thing using only (e-)TeX primitives, but there were limitations and we ended up with issues as a result. The availability of this primitive is very useful for a number of functions which otherwise cannot be implemented expandably, and the team therefore made a decision to require it in addition to those from e-TeX: it's been available for a number of years.

Best Answer

Related Solutions

[Tex/LaTex] Why does \dimexpr swallow \relax

[Tex/LaTex] \pdfstrcmp or \strcmp in pure TeX

Related Question