[Tex/LaTex] Why are some characters not allowed in command sequences

catcodesmacrosstarred-versiontex-core

From my understanding a control sequence is ended by any non-alphabetic character such that \mycsA is one token, but \mycs1 is two tokens. This means things like starred commands \mycs* are actually two tokens with the * being the first "argument" to \mycs (even when \mycs isn't defined as taking an argument. This seems to be more confusing than defining a much smaller set of characters that end a control sequence (e.g., white space). What is the advantage of TeX behaving the way it was designed?

EDIT I realized from David's answer, that my focus on terminating characters is incorrect, and I am more interested in the advantages/disadvantages of only allowing a small set of characters to be easily used in command sequences.

Best Answer

Why? questions can not really be answered except by the person who originally designed the system. But in most languages (certainly most languages of the era) the grammar for names of a language is defined by explicitly listing the allowed characters rather then listing terminating characters. In c or fortran or most other programming languages abc+xyz*rst would be three variable tokens separated by the operator tokens + and * so it is hardly uncommon.

Unlike those languages though, almost none of the lexical rules in TeX are fixed so if you want to allow + and * in multi-letter command names you just need

\catcode`\*=11 \catcode`\+=11

and you can then define \foo*+ as a command, however \alpha+\beta would no longer work you would have to do \alpha +\beta.


It isn't really accurate to say

\mycs* are actually two tokens with the * being the first "argument" to \mycs (even when \mycs isn't defined as taking an argument.

the * isn't (in general) an argument to \mycs it is simply the next token in the output stream, conside \alpha*\beta where the * is simply typeset as an infix operator between the tokens.

Related Question