[Tex/LaTex] Which commands consisting of a non-letter eat spaces afterwards

spacingtex-core

Command names consist of a sequence of letters (until and excluding the first non-letter) or of a single non-letter.

Command names ending in a letter (including of course those consisting of a single letter, such as \S and \P) eat all spaces after them. (Compile for example \P␣␣,H.)

What about command names consisting of a single non-letter: which of these eat subsequent spaces? H\$H and H\$␣H produce different output (and that's true for e.g. % and & too). While the space-producing command \, does not eat spaces, the space-producing command \␣ does seem to eat all subsequent spaces. (H\␣H and H\␣␣H and H\␣␣␣H all produce the same output. H \␣H is different; see next paragraph.)

Knowledge of TeX's behavior will explain why for example A\,B, A␣\,B/A\,␣B, A␣\,␣B produce different results (in text mode); if a user is not aware of what happens and naively (but understandably) assumes that such spacing commands eat all spaces around them, he or she might run into surprises. (Actually only few commands seem to eat preceding spaces, though such behavior is possible: have your macro start with \unskip.)


Guide to the answers:

  • most concise summary (≈ "only \␣"): Joseph Wright's answer [if it weren't for Heiko's answer, I would have accepted this one]
  • all the details (with an interesting detail about empty (!) command names): Heiko Oberdiek's answer
  • apparent exceptions (the 7 standard one-letter, accent-producing commands and \\): Mico's answer

Best Answer

From "The TeXbook":

\ddanger If TeX sees an escape character (category 0) in any state, it scans the entire control sequence name as follows. (a) If there are no more characters in the line, the name is empty (like \csname\endcsname). Otherwise (b) if the next character is not of category 11 (letter), the name consists of that single symbol. Otherwise (c) the name consists of all letters beginning with the current one and ending just before the first nonletter, or at the end of the line. This name becomes a control sequence token. TeX goes into state S in case (c), or in case (b) with respect to a character of category 10 (space) [read: "in case (b) if the single symbol is of category 10 (space)"]; otherwise TeX goes into state M.

State S is beginning of line, there spaces are ignored; state M is middle of line.

If the name consists of letters entirely, the length does not matter, one or more letters. Then TeX ignores spaces as in the begin of a line. The same happens in case of the command \␣. The command itself sets a space, but following spaces are ignored.

Backslash at line end:

If TeX reads a line, it removes the end of line characters (carriage return and/or linefeed) and all space characters from the right end (i.e., any such characters occurring immediately before the end of line character). Then it inserts the character, configured by \endlinechar, unless it is suppressed (e.g. it has a negative value). From "The TeXbook":

ddanger TeX deletes any ⟨space⟩ characters (number 32) that occur at the right end of an input line. Then it inserts a ⟨return⟩ character (number 13) at the right end of the line, except that it places nothing additional at the end of a line that you inserted with I during error recovery. Note that ⟨return⟩ is considered to be an actual character that is part of the line; you can obtain special effects by changing this catcode.

...

ddanger The special character inserted at the end of each line needn't be ⟨return⟩; TeX actually inserts the current value of an integer parameter called \endlinechar, which normally equals 13 but it can be changed like any other parameter. If the value of \endlinechar is negative or greater than 255, no character is appended, and the effect is as if every line ends with % (i.e., with a comment character).

Note: LuaTeX restricts the values of \endlinechar. The upper limit is 127. Larger values cause the error ! Invalid \endlinechar.

In LaTeX the end of line character is ^^M (character code 13, 0x0D) and has category 5 (end of line). If TeX is in state M, this end-of-line character is converted to a space [this is the important part!], thus the backslash at the end of line usually becomes \␣.

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage{lmodern}

\begin{document}

\expandafter\def\csname\endcsname{<empty>}
\def\ {<space>}
\def\@{<at>}

[\
]

\begingroup
  \endlinechar=-1
  [\
  ] 
\endgroup

\begingroup
  \endlinechar=`@ %
  [\
  ]%
\endgroup %

\end{document}

Result