[Tex/LaTex] *Why* does TeX not allow numbers in command names

macrostex-history

I am not asking how the syntax of TeX works. I know perfectly well that only letters are allowed in command names, except for single-character names and constructions using \csname. I’m just asking: Why did Knuth make this peculiar choice? In almost all other programming languages, I can call my commands f1 and f2 and my variables var1 and var2. In TeX, I cannot. Why? What was the intended benefit of this choice?

I know that if numbers had been allowed in command names, the syntax would have had to be adjusted in some places, e.g. I would have to write \kern 5pt rather than \kern5pt. But hey, I actually prefer the former to the latter, so this is hardly a dealbreaker.

Best Answer

We can give a surprisingly specific answer: the PDP-10 had 36-bit words.


Firstly, note that TeX does allow numbers in "commands", in two senses. TeX's control sequences are of two kinds:

  • A backslash followed by a sequence of letters, like \TeX (the control sequence ends when a non-letter is encountered), or
  • A backslash followed by a single non-letter, like \, or \! or \\ (the control sequence ends after that nonletter).

While many people here have pointed out that by changing the definition of “letter” (assigning the characters 0–9 to category code 11) one can make the digits 0–9 also fall in the first kind of control sequence, note also that even with the default catcodes, TeX allows the ten control sequences \0 \1 ... \9 (of the "single non-letter" kind).

See for instance the source code of The TeXbook (search for The following ``rulers'' have been typeset…), where Knuth defines the macros \1, \2, \3, \4 and \8, and again near Inserting spaces according to the table where he defines \0, \1, \2, \3 and uses them like Ord\2Bin\2Ord\3Rel….

(So if the syntax were to be changed as you suggest, then either \2Bin would no longer mean the control sequence "\2" followed by "Bin", or some special additional rules would have to be added e.g. a control sequence containing digits should not start with a digit, etc. And you mention in the question not minding writing \kern 5pt, don't forget the very common patterns like \fam0 or \box255 which would all have to be rewritten.)

Anyway, to answer your question, you must remember that the initial design of TeX was done as follows:

  1. Knuth looked at a (printed) page of TAOCP Vol 2 (typesetting the second edition of which was the original goal of the whole project).
  2. He thought about what he would like to type, in order to get that typeset output (and which he could also implement easily).
  3. He wrote the program to make it happen.

So the syntax of TeX is based more or less on Knuth's personal preferences for input conventions during those early months of 1977, alongside constraints about what was easy/possible to program on the SAIL system available to him at the time. It has been pointed out in the comments that Knuth seems to prefer typing the short form without spaces (it makes sense for him, as he writes his books with pencil on paper, and uses a computer only to type it up at the end). This is reason enough (see the example from texbook.tex above), but let's also consider the system constraints of the time.

As far as control sequences go, the very first "Preliminary preliminary description of TEX" [sic] did not even have backslashes — see TEXDR.AFT from May 1977 (reprinted as Chapter 24 of the collection Digital Typography). At this point, there were just "keywords", as in eqn. But very soon (see TEX.ONE from July 1977, reprinted as Chapter 25 of Digital Typography), he had adopted the current approach of using an escape character (backslash by default) for control sequences. All this was before a single line of code had been written.

At this point, he was under the impression that control sequences would not be important / much used. He just needed something that would be easy and efficient to implement, to fit on the machine. He implemented something that was "good enough": Chapter 11 of the SAIL TeX (aka TeX78) manual (included in TeX and METAFONT: New Directions in Typesetting), the precursor to The TeXbook, had all these rules (not applicable to current TeX!):

from Chapter 11 of SAIL TeX78 manual

The main point here is "This fact allows TeX to handle control sequences quite efficiently; and TeX's usefulness is not seriously affected".

We can figure out the reason for these rules; how it allowed TeX to handle control sequences efficiently. The SAIL language ran on the computers used at Stanford AI Lab ("SAIL"), which were PDP-10 (aka DECsystem-10) machines, that had 36-bit words. That is, the machine's native integer type could store any number in [0 … 2^36).

Note here that 2*26^7 < 2^36 < 26^8. That is, with an alphabet of 26 English letters, sequences of up to 7 letters could be made to fit in a word (also distinguishing case of the first letter), while 8 or more letters would not fit. If the 10 digits had also been included in this alphabet, then the alphabet would have had size (26+10)=36, and as 36^7 > 2^36, even sequences of length 7 would not have fit; one would have to limit distinct control sequences to length 6, giving up one whole letter's worth of length just for the small benefit of names like \some14u or whatever.

More precisely, the efficient implementation was done in the following way (see the source code, TEXSYN.SAI):

Control sequences, some of which are predeclared, are recorded in a hash table, with an associated table of their equivalent meanings. Linear probing (e.g., Algorithm 6.4L in ACP) is used to access this table […] packed representations of longer control seqences, using six bits for the first letter (in order to distinguish upper and lower case) and five bits for each remaining letter, left justified in the word.

Even at the time of printing of this manual, these restrictions had started to be relaxed: the final Appendix X "Recent extensions to TeX" (starts with "Stop the presses! The following features were added to TeX just before this manual was printed") ends with (the very last words of that TeX78 manual): "Control sequences of any length are now remembered in full; the seven-letter truncation mentioned in Chapter 2 no longer happens".

When TeX was rewritten in Pascal (WEB) during 1980–1982, many of these restrictions were taken away, but the system had been in wide usage by then and the syntax had more or less converged; there was still no perceived need for allowing control sequences mixing letters and digits, and breaking existing usage of control sequences like \0 and \1.

He touches on this at around 42:00 in this 1982 video (part of a series about the internal details of TeX82):

In the SAIL version of TeX… the implementation of strings was quite inefficient: we didn’t want to use SAIL strings for control sequences. Instead we had to go into our dynamic memory and take away valuable space there for the names of control sequences. My first original design of TeX if anybody remembers way back in 1977, I had real strange restrictions that control sequence had to consist of at most five [he means seven —S] letters, and upper- and lower-case were not distinguished after the first one, and things like this was all so that I could keep a separate table of my control sequences. I had the idea at first that hardly anybody would be defining new control sequences.

[Audience laughter]

I didn’t realize that macros were going to be very powerful at first... gradually we found that people actually want control sequences so we've made room for them.

Related Question