[Tex/LaTex] Expandable full expansion of tokens that preserves catcodes

expansionpdftex

Is it possible to fully expand tokens in an expandable manner and preserves category codes? I'd like to do this just using pdfTeX.

I'm looking for something that would work like this:

\def\foo{foo}
\fullyexpand{\foo bar baz}

which would expand to the 10 tokens "foobar baz" and retain the category codes.

If we drop the expandable requirement, then this is easy.

\def\fullyexpand#1{\edef\fetemp{#1}\fetemp}

If we drop the retain catcode requirement, it's doable, but less straight-forward. Here's the best I could come up with.

\def\gobbleprefix#1#2\femarker{%
        \ifnum\escapechar<0
                #1%
        \else\ifnum\escapechar>255
                #1%
        \fi\fi
        #2%
}
\def\fullyexpand#1{%
        \expandafter\expandafter\expandafter\gobbleprefix
        \expandafter\string\csname#1\endcsname\femarker
}

I was hoping to combine the ε-TeX extension \scantokens with the above (by an appropriate, trivial modification of \gobbleprefix), but of course, that does not work. \scantokens's expansion is empty and pdfTeX just acts like it has opened a new file.

(As an aside, it amuses me that not expanding tokens is expandable—using \unexpanded—but expanding tokens isn't, at least not obviously so.)

Best Answer

Did you try using \romannumeral? This is used a lot for this type of thing (see for example the \exp_args:Nf concept in expl3):

\def\fullyexpand#1{\romannumeral - `0#1}

This works because TeX will keep expanding #1 looking for a number, which will always turn out to be negative, so the Roman numeral will vanish. Note that this solution will stop on the first non-expandable token, unlike an \edef which will keep going.

It's possible to build a function which can expand using \romannumeral 'around' unexpandable tokens. For example, the following code will work reasonably well:

\long\def\fullyexpand#1{%
  \csname donothing\fullyexpandauxi{#1}{}%
}
\long\def\fullyexpandauxi#1{%
  \expandafter\fullyexpandauxii\romannumeral -`0#1\fullyexpandend
}
\long\def\fullyexpandauxii#1#2\fullyexpandend#3{%
  \ifx\donothing#2\donothing
    \expandafter\fullyexpandend
  \else
    \expandafter\fullyexpandloop
  \fi
  {#1}{#2}{#3}%
}
\long\def\fullyexpandend#1#2#3{\endcsname#3#1}
\long\def\fullyexpandloop#1#2#3{%
  \fullyexpandauxi{#2}{#3#1}%
}
\def\donothing{}

However, this is not the same as \expanded, for a few reasons. First, my implementation will strip out spaces in the argument (as it does a loop, and TeX will skip spaces). Braces will also get stripped out. A bit of testing also reveals that \romannumeral will expand \protected functions here, whereas \expanded does not. I'd also note that the above code needs some guards adding for a blank (empty or all space) argument, as currently things fail in these cases.


With current release LuaTeX one can use \expanded, which does more-or-less the same as an \edef but is expandable (it doesn't required doubled # tokens also). This primitive will be in TeX Live 2019 pdfTeX/e-pTeX/e-upTeX, and hopefully in XeTeX (yet to be confirmed). As a precursor to this, expl3 has a macro-based emulation, slow but working, which does token-by-token examination and allows 'e-type' expansion.


On the aside, it is possible to use \scantokens expandably, but as you may have found this can be tricky and it is usually necessary to have a (non-expandable) change of \everyeof first. LuaTeX addresses this issue with the \scantextokens primitive, which combines this end-of-file stuff directly into the primitive. Of course, if you are using LuaTeX then the original problem is solvable anyway, since \expanded is available.