[Tex/LaTex] When does blank space count

spacing

I suppose this is too general of a question, but why does

 \underbar{word} 

give different answer from

 \underbar{ word} 

and then additional blanks do not count

 \underbar{     word}

Best Answer

The rule about spaces is quite simple (or maybe not too simple).

When TeX reads a line of input it is in one of three states: state N (new line), state S (skipping blanks) or state M (middle of line).

It's clear that at the beginning of a line it is in state N; when in this state, any blank space (TeXnically a character with character code 10, usually space and TAB) is ignored. This state stops to become state M upon finding a non blank character.

If the next character is the end-of-line character (more precisely, a character having category code 5), a \par token is generated and TeX reenters state N for the next line, throwing away anything that possibly follows that character.

When TeX is in state M it generates character tokens for each character it finds unless

  1. the next character is a space, or
  2. the next character is the end-of-line character, or
  3. the next character is the escape character (the backslash).

In case 1 TeX generates a space token and enters state S. In case 2 TeX inserts a space token, throws away what remains on the line and enters state N for reading the next line. In case 3 TeX proceeds to form a symbolic token and there are two subcases:

  1. the next character is a letter (TeXnically, a character with category code 11)
  2. the next character is any other character

In subcase 4 TeX forms the symbolic token's name by accumulating characters until they are letters and, upon finding a non letter, it enters state S. In subcase 5, the symbolic token's name is the single non letter (this is the case of \, or \\, for instance) and remains in state M, unless that non letter was a space, in which case it enters state S.

When TeX is in state S, blank space characters are ignored until finding any character that is not a blank space character, which triggers state M again.

The above discussion is not as general as possible, but adequate for the normal status of category codes.

Let's examine your examples. First

\underbar{word}

where \ triggers state M and, since u is a letter, we are in subcase 4, so the symbolic token \underbar is generated, because { is not a letter. So we have the following tokens

\underbar • { • w • o • r • d • }

(here the spaces are not significant and separates tokens).

In the second example

\underbar{ word}

we're in the same situation, but the space after { generates a space token and makes TeX enter state S, which is turned into state M by the w. So the formed tokens are

\underbar • { • <SP> • w • o • r • d • }

(where <SP> denotes a space token). The third example

\underbar{     word}

gives the same result, because upon seeing the first space character after {, TeX enters state S after having generated a space token.

The same as in the first example would have happened with

\underbar {word}

because the space stops the formation of the control sequence name and, according to the rule, it enters state S without generating a space token. The only dubious case is

\⍽⍽

where now denotes a space character (just to make clear that there are two of them). But according to the rule explained in subcase 5, this generates the token \⍽ and TeX enters state S. Thus typing \⍽, \⍽⍽ or \⍽⍽⍽ is just the same.

Related Question