- I think that what constitutes the end of line upon reading from a file is hardcoded according to whatever operating system you are running on. And that end of line is represented by the character whose number is
\endlinechar
.
- When writing, the character whose number is
\newlinechar
will trigger the end of a line. Again, the exact result in the output file is hard coded, depending on your operating system.
- See #1.
- Usually, the argument to
\scantokens
is treated as a single line. Thus a percent sign in the argument to \scantokens
will end input from this argument. However, any occurrences of the character whose number is \newlinechar
will be used to split the argument into several lines.
To bring all these ideas together, consider the plain TeX file
\newlinechar=2
{\catcode`\%=12
\gdef\foo{\scantokens{abc%xyz^^Bdef}}}%
\endlinechar=`X
\foo%
\bye%
which will typeset the text “abcdefX” .
(Edited to take into account what I learned about #4 from the comments.)
When one says
\toks0={\plainoutput}\showthe\toks0
TeX answers
> \plainoutput .
If one says after this
\output=\toks0
then \showthe\output
gives
> \plainoutput .
If one then says \hbox{}\penalty-10000
, then the error message
! Missing { inserted.
<to be read again>
\shipout
\plainoutput ->\shipout
\vbox {\makeheadline \pagebody \makefootline }\advan...
<*> \hbox{}\penalty-10000
is issued. Now TeX is in internal vertical mode, as \end
produces
`! You can't use `\end' in internal vertical mode.`
but typing }
doesn't do any good, as we fall in the black hole when TeX doesn't interpret any more token.
It's interesting that TeX adds braces when the assignment to \output
is of the form
\output=<general text>
but not when it's like \output=<token register>
.
The relevant code for the addition of braces is in module 1226, where Knuth comments "For safety’s sake, we place an enclosing pair of braces around an \output
list.", and module 1227.
An interesting module to examine is 1100, which ends with output_group: followed by what is in module 1026.
I realize this is not a full answer: it's only to show that it's better not to monkey with the closing brace. TeX is in a very particular type of group, when it performs the output routine and disturbing it during this task reveals to be quite dangerous.
I've also found a discussion on comp.text.tex
that seems to have points in common with this problem.
Best Answer
The
\scantokens
primitive is described in the e-TeX manual as working in a similar manner to the following code:but without the use of files and in an expandable manner. However, it does use the some of the same internals as the above. This has a consequence for using the primitive.
A pseudo-file is 'read' by TeX, and this is treated as having an end-of-file (EOF) marker.
\scantokens
tries to read this as a token, but that will raise an error, for examplewith code
To prevent this, you need to set
\everyeof
to insert a\noexpand
before this marker:TeX then does not try to read past the end of the file and this error is avoided.
The second issue is that TeX tokenizes the 'end of line' characters in the normal way inside
\scantokens
. The common use is to have a single line scanned, as above, but the result will not be as might be expected:yields
with an additional space: the final 'end of line' (end of the pseudo-file) is converted to a space. To prevent this, you normally alter the end-of-line behaviour with
so that the end-of-line is ignored and no space is added.
It's then standard to wrap everything up in a group, for example when saving the result in a macro
The group is used here so that the two additional steps don't affect any other code, while
\xdef
is the simplest way to get the result outside of the group. (An appropriate\expandafter
chain is also a possible approach for that.)All of this makes the resulting use non-expandable, which somewhat defeats the point of the primitive (although files are still not used). As a result, in LuaTeX there is a
\scantextokens
primitive which specifically addresses these issues: the end-of-file is ignored and no end line character is inserted after the last line (which is almost always the only line).