I am not asking how the syntax of TeX works. I know perfectly well that only letters are allowed in command names, except for single-character names and constructions using \csname
. I’m just asking: Why did Knuth make this peculiar choice? In almost all other programming languages, I can call my commands f1
and f2
and my variables var1
and var2
. In TeX, I cannot. Why? What was the intended benefit of this choice?
I know that if numbers had been allowed in command names, the syntax would have had to be adjusted in some places, e.g. I would have to write \kern 5pt
rather than \kern5pt
. But hey, I actually prefer the former to the latter, so this is hardly a dealbreaker.
Best Answer
We can give a surprisingly specific answer: the PDP-10 had 36-bit words.
Firstly, note that TeX does allow numbers in "commands", in two senses. TeX's control sequences are of two kinds:
\TeX
(the control sequence ends when a non-letter is encountered), or\,
or\!
or\\
(the control sequence ends after that nonletter).While many people here have pointed out that by changing the definition of “letter” (assigning the characters 0–9 to category code 11) one can make the digits 0–9 also fall in the first kind of control sequence, note also that even with the default catcodes, TeX allows the ten control sequences
\0
\1
...\9
(of the "single non-letter" kind).See for instance the source code of The TeXbook (search for
The following ``rulers'' have been typeset
…), where Knuth defines the macros\1
,\2
,\3
,\4
and\8
, and again nearInserting spaces according to the table
where he defines\0
,\1
,\2
,\3
and uses them likeOrd\2Bin\2Ord\3Rel
….(So if the syntax were to be changed as you suggest, then either
\2Bin
would no longer mean the control sequence "\2" followed by "Bin", or some special additional rules would have to be added e.g. a control sequence containing digits should not start with a digit, etc. And you mention in the question not minding writing\kern 5pt
, don't forget the very common patterns like\fam0
or\box255
which would all have to be rewritten.)Anyway, to answer your question, you must remember that the initial design of TeX was done as follows:
So the syntax of TeX is based more or less on Knuth's personal preferences for input conventions during those early months of 1977, alongside constraints about what was easy/possible to program on the SAIL system available to him at the time. It has been pointed out in the comments that Knuth seems to prefer typing the short form without spaces (it makes sense for him, as he writes his books with pencil on paper, and uses a computer only to type it up at the end). This is reason enough (see the example from
texbook.tex
above), but let's also consider the system constraints of the time.As far as control sequences go, the very first "Preliminary preliminary description of TEX" [sic] did not even have backslashes — see
TEXDR.AFT
from May 1977 (reprinted as Chapter 24 of the collection Digital Typography). At this point, there were just "keywords", as ineqn
. But very soon (seeTEX.ONE
from July 1977, reprinted as Chapter 25 of Digital Typography), he had adopted the current approach of using an escape character (backslash by default) for control sequences. All this was before a single line of code had been written.At this point, he was under the impression that control sequences would not be important / much used. He just needed something that would be easy and efficient to implement, to fit on the machine. He implemented something that was "good enough": Chapter 11 of the SAIL TeX (aka TeX78) manual (included in TeX and METAFONT: New Directions in Typesetting), the precursor to The TeXbook, had all these rules (not applicable to current TeX!):
The main point here is "This fact allows TeX to handle control sequences quite efficiently; and TeX's usefulness is not seriously affected".
We can figure out the reason for these rules; how it allowed TeX to handle control sequences efficiently. The SAIL language ran on the computers used at Stanford AI Lab ("SAIL"), which were PDP-10 (aka DECsystem-10) machines, that had 36-bit words. That is, the machine's native integer type could store any number in [0 … 2^36).
Note here that 2*26^7 < 2^36 < 26^8. That is, with an alphabet of 26 English letters, sequences of up to 7 letters could be made to fit in a word (also distinguishing case of the first letter), while 8 or more letters would not fit. If the 10 digits had also been included in this alphabet, then the alphabet would have had size (26+10)=36, and as 36^7 > 2^36, even sequences of length 7 would not have fit; one would have to limit distinct control sequences to length 6, giving up one whole letter's worth of length just for the small benefit of names like
\some14u
or whatever.More precisely, the efficient implementation was done in the following way (see the source code, TEXSYN.SAI):
Even at the time of printing of this manual, these restrictions had started to be relaxed: the final Appendix X "Recent extensions to TeX" (starts with "Stop the presses! The following features were added to TeX just before this manual was printed") ends with (the very last words of that TeX78 manual): "Control sequences of any length are now remembered in full; the seven-letter truncation mentioned in Chapter 2 no longer happens".
When TeX was rewritten in Pascal (WEB) during 1980–1982, many of these restrictions were taken away, but the system had been in wide usage by then and the syntax had more or less converged; there was still no perceived need for allowing control sequences mixing letters and digits, and breaking existing usage of control sequences like
\0
and\1
.He touches on this at around 42:00 in this 1982 video (part of a series about the internal details of TeX82):