[Tex/LaTex] Use of \everyeof and \endlinechar with \scantokens

e-textex-core

The e-TeX \scantokens primitive allows retokenization of input. However, it is almost always used in a group where \everyeof and \endlinechar are set (see for example Can I convert a string to catcode 11?). What is the reasoning for requiring these two steps?

Best Answer

The \scantokens primitive is described in the e-TeX manual as working in a similar manner to the following code:

\toks0={...}% '...' is the rescanned material
\immediate\openout0=file
\immediate\write0{\the\toks0}
\immediate\closeout0
\input file

but without the use of files and in an expandable manner. However, it does use the some of the same internals as the above. This has a consequence for using the primitive.

A pseudo-file is 'read' by TeX, and this is treated as having an end-of-file (EOF) marker. \scantokens tries to read this as a token, but that will raise an error, for example

! File ended while scanning definition of \demo

with code

\edef\demo{\scantokens{foo}}

To prevent this, you need to set \everyeof to insert a \noexpand before this marker:

\everyeof{\noexpand}
\edef\demo{\scantokens{foo}}

TeX then does not try to read past the end of the file and this error is avoided.

The second issue is that TeX tokenizes the 'end of line' characters in the normal way inside \scantokens. The common use is to have a single line scanned, as above, but the result will not be as might be expected:

\everyeof{\noexpand}
\edef\demo{\scantokens{foo}}
\show\demo

yields

> \demo=macro:
->foo .

with an additional space: the final 'end of line' (end of the pseudo-file) is converted to a space. To prevent this, you normally alter the end-of-line behaviour with

\endlinechar=-1

so that the end-of-line is ignored and no space is added.

It's then standard to wrap everything up in a group, for example when saving the result in a macro

\long\def\safescantokens#1#2{%
  \begingroup
    \everyeof{\noexpand}%
    \endlinechar=-1
    \xdef#1{\scantokens{#2}}%
  \endgroup
}
\safescantokens\demo{foo}

The group is used here so that the two additional steps don't affect any other code, while \xdef is the simplest way to get the result outside of the group. (An appropriate \expandafter chain is also a possible approach for that.)

All of this makes the resulting use non-expandable, which somewhat defeats the point of the primitive (although files are still not used). As a result, in LuaTeX there is a \scantextokens primitive which specifically addresses these issues: the end-of-file is ignored and no end line character is inserted after the last line (which is almost always the only line).

Related Question