[Tex/LaTex] What exactly is a “single character” or “symbol” in math mode

math-modetex-core

There are at least three instances where single symbols in math mode receive a special treatment by the TeX engine: 1. when accents are placed, 2. when a math operator is created with \mathop, and 3. when an accented quantity is surrounded by braces. In each instance, different words are used in the TeXbook instead of "single symbol":

  1. "a single character" (p. 443, Rule 12),
  2. "a symbol" (p. 443, Rule 13),
  3. "a single Acc atom" (p. 291).

Now I am wondering: what exactly is meant by these phrases, e.g., what qualifies as "a single character"? Are the same things meant in all the three instances? And finally: Is it documented somewhere what exactly is meant? (I'll be happy with references to tex.web, too.)


After reading this comment of egreg, it seems to me that the "single symbols" need to be of type \mathord. To explain why, I'll use \mathchar to take the letter A from the math italic font, and I'll test the different classes: \mathord (class 0), \mathop (class 1), \mathbin (class 2), \mathrel (class 3), \mathopen (class 4), \mathclose (class 5), \mathpunct (class 6).

The following image shows that 1. accents are placed properly only for class 0,  2. \mathop centers a symbol with respect to the mathematical axis only for classes 0 and 1 (but in the case of class 1 the symbol is already of type \mathop), and 3. braces around an accented quantity lead to a raised superscript unless the symbol is of class 0.

image

\documentclass{article}
\begin{document}
1.\ Accent placement:
$
\hat{\mathchar"0141}
\hat{\mathchar"1141}
\hat{\mathchar"2141}
\hat{\mathchar"3141}
\hat{\mathchar"4141}
\hat{\mathchar"5141}
\hat{\mathchar"6141}
$

2.\ Creating a (centered) math operator:
$
\mathop{\mathchar"0141}
\mathop{\mathchar"1141}
\mathop{\mathchar"2141}
\mathop{\mathchar"3141}
\mathop{\mathchar"4141}
\mathop{\mathchar"5141}
\mathop{\mathchar"6141}
$

3.\ Enclosing an accented quantity in braces:
$
{\hat{\mathchar"0141}}^H
{\hat{\mathchar"1141}}^H
{\hat{\mathchar"2141}}^H
{\hat{\mathchar"3141}}^H
{\hat{\mathchar"4141}}^H
{\hat{\mathchar"5141}}^H
{\hat{\mathchar"6141}}^H
$
\end{document}

Best Answer

This is a tricky question. The answer is: 1) and 2) are the same but 3) is something totally different. In other words, Don was a bit sloppy when writing "single character" and "a symbol". In both cases he meant

<math symbol>  ->  <character> | <math character> 

as documented in BNF notation on page 289 in the TeXbook.

To understand the behavior and the examples one has to look hard into code and/or in various places other in the documentation, e.g., tex.web. Key to understand your examples are the pages 290-291 in the TeXbook. Here Don says:

A <math field> is used to specify the nucleus, superscript, or subscript of an atom. When a <math field> is a <math symbol>, the $f$ and~$a$ numbers of that symbol go into the atomic field. Otherwise the <math field> begins with a {, which causes \TeX\ to enter a new level of grouping and to begin a new math list; the ensuing <math mode material> is terminated by a }, at which point the group ends and the resulting math list goes into the atomic field. If the math list turns out to be simply a single Ord atom without subscripts or superscripts, or an Acc whose nucleus is an Ord, the enclosing braces are effectively removed.

We can observe this behavior with the following small little document:

\documentclass{article}
\begin{document}
$
A
{A}
{BC}
\hat A
{\hat A}
{\hat {BC}}
\showoutput\showlists
$
\stop

On the terminal we then get:

### math mode entered at line 3
\mathord
.\fam1 A
\mathord
.\fam1 A
\mathord
.\mathord
..\fam1 B
.\mathord
..\fam1 C
\accent\fam0 ^
.\fam1 A
\accent\fam0 ^
.\fam1 A
\accent\fam0 ^
.\mathord
..\fam1 B
.\mathord
..\fam1 C

The A and the {A} both produce the same: a single \mathord atom with its nucleus being \fam 1 A. But {BC} is different, you see additional ord nodes inside. The other three lines show that the accent "migrates" out of the braces if it is the only atom inside. This what is meant on page 291 and answers question 3):

A single Acc is an Acc atom by itself, e.g., \hat{BC} is one, even though its nucleus has 2 math symbols inside.

So now for creating math accents or math operators: your example seems to indicate that "a single character" or "a symbol" refer only to Ord atoms. But this is not true, the reason that your example behaves as it behaves is due to the above definition of the <math field> behavior. If we examine you example by removing the \hat on each line, e.g.,

$
{\mathchar"0141}
{\mathchar"1141}
{\mathchar"2141}
{\mathchar"3141}
{\mathchar"4141}
{\mathchar"5141}
{\mathchar"6141}
\showoutput\showlists

we get

### math mode entered at line 3
\mathord
.\fam1 A
\mathord
.\mathop
..\fam1 A
\mathord
.\mathbin
..\fam1 A
\mathord
.\mathrel
..\fam1 A
\mathord
.\mathopen
..\fam1 A
\mathord
.\mathclose
..\fam1 A
\mathord
.\mathpunct
..\fam1 A

and we can see that except for the first line the outer \mathord (which is our mathfield) not just contains \fam1 A but another atom, ie a two-level structure not a single symbol. So if that is becoming the nucleus of an Acc atom it is first turned into a box possibly with some spacing attached inside and then the accent is placed and that accounts for the difference.

To prove my point, let's remove the braces in your example, which gives very unreadable but perfectly valid code (as a single <math symbol> is one possibility for <math field> and \mathchar is such a symbol, see BNF notation on page 289):

$
\hat\mathchar"0141
\hat\mathchar"1141
\hat\mathchar"2141
\hat\mathchar"3141
\hat\mathchar"4141
\hat\mathchar"5141
\hat\mathchar"6141
\showoutput\showlists

And now we suddenly see that indeed any <math symbol> is possible and there is no difference as long as it isn't part of a bigger sub-formula in braces.

### math mode entered at line 3
\accent\fam0 ^
.\fam1 A
\accent\fam0 ^
.\fam1 A
\accent\fam0 ^
.\fam1 A
\accent\fam0 ^
.\fam1 A
\accent\fam0 ^
.\fam1 A
\accent\fam0 ^
.\fam1 A
\accent\fam0 ^
.\fam1 A 

And of course the same is true if you try to produce a \mathop. The issue is that the moment you use {...} the stuff inside becomes a complex structure (a math list), even if you put only a single atom inside, unless --- and that is the exception documented in the paragraph above from the TeXbook --- that single atom is an Ord atom or an Acc atom.

(Deep breath) ... feels a bit like TeXbook p125 :-)

Related Question