[Tex/LaTex] the difference between ‘macro’ and ‘command’

macros

Some people use the word 'macro' and 'command' interchangeably to refer to the instructions given to LaTeX. Is there a real difference between the two terms? If there is, can we see some examples on what the differences are?

Best Answer

I seldom use the term "command" when it is about (La)TeX.

In the TeXbook the term "command" does also appear in contexts where not single tokens but the coming into being of sequences of control word tokens plays a rôle.

E.g., chapter 15 of the TeXbook includes a double dangerous bend paragraph where you can read:

[...] Therefore plain TeX provides an allocation function for registers; Appendix B includes the command
\newinsert\footins
which defines \footins as the number for footnote insertions. [...]

So, e.g., \newinsert\footins is a "command".

(By the way: The mentioned appendix B does actually not exhibit tokens. It exhibits TeX code, i.e., TeX-input which the programmer of the plain TeX format did produce. That TeX-input itself is not tokens. TeX will create tokens when processing that code.)

In the TeXbook you also find the phrase "primitive command". It is used in contexts where very short and easily to grasp sequences of TeX-input play a rôle. (Sequences that, when being read and tokenized, yield primitives...)

I suppose, a "command" according to the diction of the TeXbook is a sequence of characters which forms some portion of TeX-input. A portion of TeX-input which, when TeX reads and tokenizes that sequence of characters, yields a set of whatsoever tokens and/or (e.g., in case the command in question contains "invalid characters"/characters of category code 15) some error-messages. In case of yielding a set of tokens, that set of tokens is intended to form a syntactical entity which TeX's token-processing-apparatus can attempt to carry out/to process in whatsoever way. Such an attempt in turn may also yield error-messages and/or erroneous behavior.

Thus the term "command" in the diction of the TeXbook itself seems to actually not focus on things as results of TeX's "digestive processes". One of the earlier results of TeX's digestive processes are tokens. Tokens are produced in TeX's mouth. Thus the term "command" itself seems to actually not refer to tokens. It seems to refer to subsets of the things which TeX "looks at" with its "eyes"—it refers to portions of the TeX-input, i.e., to the things which in interactive mode the user/programmer can type, or which TeX can read from .tex-input-files or \read-handles for tokenization and further processing.

In the LaTeX 2e-kernel you have things like \newcommand and \renewcommand.

These things during LaTeX's digestive processes cause LaTeX to perform assignments. But the point of view seems not to be the view of the LaTeX-program which "looks at" LaTeX-input and while "digesting" produces "tokens" etc. The point of view seems to be the view of the programmer. The programmer does not produce tokens. She/he produces LaTeX-input. That LaTeX-input is not tokens yet. It is a sequence of input-characters.

After writing \newcommand..., the programmer in her/his code can take the thing introduced via \newcommand for a small portion of LaTeX-input which when tokenized by LaTeX yields a token which itself is considered a syntactical entity which, when used as intended, can be carried out via LaTeX's token-processing-apparatus without yielding erroneous behavior.

I could not find the phrase "command token" in texbook.tex.

This can also be taken for an indication that "command" and "token" in the diction of the TeXbook do in the upshot not refer to the same stage of work.

It seems, in the TeXbook the term "command" is used when the focus is on that stage where the programmer produces/delivers/types whatsoever .tex-input.

It seems, in the TeXbook the terms "token" and "macro" are used when the focus is on that stage where the TeX-program does process the .tex-input.


The term "macro" seems to refer to the concept of TeX's tokens while tokens and thus macro-tokens do not come into being while writing/producing/delivering TeX-input but while having the TeX-program read and tokenize TeX-input and from the resulting tokens expand the expandable ones.

As far as I know, macro-tokens are ⟨control sequences⟩ that both are expandable and do not have the meaning of one of those about 300 low-level atomic operations that are called "primitives".

What kinds of tokens can form such ⟨control sequences⟩?

⟨control sequences⟩ come in two flavors:

  1. ⟨control sequence tokens
    These divide into ⟨control symbol tokens and ⟨control word tokens
  2. ⟨active character tokens

With this remark we reach the question of how to divide up ⟨control sequences⟩.


As just seen, one way of dividing up ⟨control sequences⟩ is by looking at the question what kinds of tokens can form ⟨control sequences⟩:

⟨control sequence tokens are those lovely "thingies" that in TeX input need to be preceded by a character whose category code is 0 (escape).
Usually the backslash-character is the only character whose category code is 0.

⟨control sequence tokens come in two flavors:

⟨control symbol tokens⟩ have names which consist of a single character which does not have category code 11 (letter).

E.g., usually \. and \\ are ⟨control symbol tokens⟩ as usually the characters . and \ do not have category code 11 (letter). \a usually is not a ⟨control symbol token⟩ as usually the character a does have the category code 11 (letter).

⟨control word tokens⟩ have names which consist either of several characters which do not need to have the same category code, or of a single character which does have the category code 11 (letter).

There are two ways how a ⟨control sequence⟩ (, i.e., a ⟨control sequence token⟩ or an ⟨active character token⟩, see below) can come into being:

  1. When (La)TeX reads and tokenizes input.
  2. When (La)TeX inserts it during expansion; e.g., as part of the ⟨replacement text⟩ of a macro; e.g., as part of the \the-expansion of a token-register.

A ⟨control sequence token⟩ can come into being also

  • as the result of expanding some \csname..\endcsname-expression.

Any character whose current category code is not 11 (letter), even a character of category code 15 (invalid), can form the name of a ⟨control symbol token⟩ that came into being as a result of reading and tokenizing TeX-input.
(I say "current" because as soon as the character that forms the name of the ⟨control sequence token⟩ in question is switched to 11 (letter), the ⟨control sequence token⟩ in question will not be treated as a ⟨control symbol token⟩ (any more) but as a ⟨control word token⟩.)

Otherwise you could, e.g., not use ⟨control symbol tokens⟩ for changing the category codes of characters that currently have category code 15 (invalid).

But you can do, e.g.,:

\catcode`\A=15 % Now uppercase-a is invalid.
% Typing an "A" both outside a comment and not behind a character
% of category code 5 (end of line) which is not
% intended to form the name of a control-symbol-token would yield
% an error-message now.
% But you can do:
\catcode`\A=11 % Now uppercase-a is a letter again.

After elaborating on the names of ⟨control symbol tokens⟩, let's elaborate on the names of ⟨control word tokens⟩:

On the one hand, TeX's reading- and tokenizing-apparatus will only take sequences of characters of category code 11 (letter) as names of ⟨control word tokens⟩.
Thus names of ⟨control word tokens⟩ created via having TeX read and tokenize them directly from TeX-input consist only of characters that had category code 11 (letter) at the time of reading and tokenizeng them.
On the other hand names of ⟨control word tokens⟩ created via evaluating \csname..\endcsname-expressions can consist either of more than one character of whatsoever category code differing from 5, 9, 14 and 15 at the time of reading and tokenizing the \csname..\endcsname-expression , or of a single character of category code 11 (letter).

Now some remarks about the subtle differences between TeX's treatment of ⟨control symbol tokens⟩ and TeX's treatment of ⟨control word tokens⟩:

Usually the space character (code-point 32 both in ASCII and in unicode) and the horizontal tab character (code-point 9 both in ASCII and in unicode) are the only characters which have category code 10 (space).

A character of category code 10 (space) that in the TeX-input occurs behind a character-sequence that got tokenized as a ⟨control symbol token⟩ whose name is formed by a character which does not have category code 10 (space) will be tokenized as an explicit space token.

A character of category code 10 (space) that in the TeX-input occurs behind a character-sequence that got tokenized as a ⟨control symbol token⟩ whose name is formed by a character which does have category code 10 (space) will be ignored/will not yield any token.

E.g., category code 10 (space)-characters that follow the control-space,  , i.e., that ⟨control symbol token⟩ whose name is formed by the space character, will be ignored/will not yield any token.

A character of category code 10 (space) that in the TeX-input occurs behind a character-sequence that got tokenized as ⟨control word token⟩ will be ignored/will not yield any token.

When (La)TeX does unexpanded-write a ⟨control word token⟩ (be it writing to file, be it writing to the screen), it automatically inserts a space character at the end of the character sequence that represents the ⟨control word token⟩ in question.

When (La)TeX does unexpanded-write a ⟨control symbol token⟩ it will not automatically insert such a trailing space character.

When applying \string for transforming the following token into a sequence of ⟨character tokens⟩ (La)TeX will not attach an additional trailing space token.
(\string produces character tokens of category code 12 (other). Exception: Spaces produced by \string will be so called *explicit space tokens, i.e., character tokens with character code 32 and category code 10 (space).)

You can change the way in which TeX does unexpanded-write a ⟨control sequence token⟩ whose name consists of a single character by before writing changing the category code of that character either to 11 (letter)—in this case it will be written as a ⟨control word token⟩ with trailing space— or to a value differing from 11—in this case it will be written as a ⟨control symbol token⟩ without trailing space.

Thus you can have fun with TeX's automatic insertion of spaces when doing delayed writes with unexpanded ⟨control sequence tokens⟩ whose names consist of a single character—the code

\documentclass{article}
\begin{document}
\newwrite\mywrite
\immediate\openout\mywrite test.txt\relax

\catcode`\A=12

\write\mywrite{b\noexpand\Ab}
\immediate\write\mywrite{b\noexpand\Ab}

\catcode`\A=11

Hello

\end{document}

besides the .dvi- or .pdf-file yields a file test.txt with the following content:

b\Ab
b\A b

As both \write-commands were issued while A was of category code 12 (other), you might have expected each \A to be unexpanded-written as ⟨control symbol token⟩ without a trailing space, and thus you might have expected the output

b\Ab
b\Ab

. But the first \write was not \immediate and thus it got carried out not immediately but at the time of shipping out the page. At that time, the second \write, which was \immediate, was already done and \A had category code 11 (letter) again and thus the \A in the first \write was written within the second line as a ⟨control word token⟩, with LaTeX attaching a trailing space character.

The nameless ⟨control sequence token⟩—methods for forming it are: 1) expanding \csname\endcsname; 2) placing a backslash (a character of category code 0 (escape)) at the end of a line while the parameter \endlinechar has a non-positive value or a value outside the range of code-points of the TeX-engine's input-encoding—deserves special attention:
Applying \string to it always yields the catcode 12(other) character token sequence \csname\endcsname. There is no space token between \csname and \endcsname. There also is no trailing space token.
Unexpanded-writing it yields the character sequence \csname\endcsname␣—there is a trailing space character.

Besides the ⟨control sequence tokens there is another kind of ⟨control sequences⟩: ⟨active character tokens⟩.

An ⟨active character token⟩ is a ⟨character token⟩ whose category code is 13 (active).
An ⟨active character token⟩ can be used like a ⟨control sequence token⟩.
E.g., after \catcode`\b=13 you can do \def b{The active character \string b is a macro now.}


Another way of dividing up ⟨control sequences⟩ is by looking at their "decomposability":

The TeXBook says:

About 300 of TeX's control sequences are called primitive; these are the low-level atomic operations that are not decomposable into simpler functions. All other control sequences are defined, ultimately, in terms of the primitive ones. For example, \input is a primitive operation, but \’ and " are not; the latter are defined in terms of an \accent primitive.

(It should not be overlooked that when defining ⟨control sequences⟩ that are macros, ⟨non-active character tokens⟩ can play a rôle also.)


Yet an another way of dividing up ⟨control sequences⟩ is by looking at the actions triggered by them and at the point in time when these actions are triggered:

There is the analogy between TeX's input-processing and digestive processes:

  • TeX input is divided up into so-called tokens (⟨control sequence tokens⟩ and ⟨character tokens⟩) in TeX's mouth.
  • Expansion of expandable ⟨control sequences⟩ takes place in the gullet.
  • Unexpandable ⟨control sequences⟩ will be handled in the stomach.

So you distinguish expandable ⟨control sequences⟩ and unexpandable ⟨control sequences⟩.

Expandable ⟨control sequences⟩ (and their arguments) will in the gullet be replaced by other constellations of tokens. Replacement of expandable ⟨control sequences⟩ takes place in the gullet until there are no more expandable ⟨control sequences⟩ left.

Unexpandable ⟨control sequences⟩ (which came into being either during tokenization in TeX's mouth or via replacement of expandable tokens during expansion) pass the gullet and reach the stomach where boxes will be built, assignment-primitives will be carried out, etc.

There are primitive ⟨control sequences⟩ that are expandable:

E.g., the \string-primitive is an expandable primitive. Together with the following token it vanishes in TeX's gullet and as replacement you get a set of ⟨character tokens⟩ of category code 12 (other) or 10 (space) that represents the name of the ⟨control sequence token⟩ in question/is the catcode-12-pendant of the ⟨character token⟩ in question. (When an explicit space token trails \string, \string will deliver an explicit space token.)

E.g., the \csname-primitive is an expandable primitive. It triggers the processing of a \csname..\endcsname-expression in TeX's gullet and it vanishes in TeX's gullet and as replacement you get the corresponding ⟨control symbol token⟩ or ⟨control word token⟩. In case that token is undefined, it will get the meaning of the \relax-primitive within the current scope.

E.g., the \romannumeral-primitive is an expandable primitive. It and a following ⟨number⟩ vanish in TeX's gullet and in case the ⟨number⟩ is positive as replacement you get a sequence of ⟨character tokens⟩ of category code 12 (other) that represent the corresponding ⟨number⟩ in lowercase-roman notation.

There are primitive ⟨control sequences⟩ that are not expandable:

E.g., the \relax-primitive is not expandable. It does pass the gullet and reach the stomach.

E.g., the assignment-primitives \def, \edef, \gdef, \xdef, \countdef etc are not expandable. They do not trigger their replacement in TeX's gullet but will reach the stomach.

There are non-primitive ⟨control sequences⟩ that are expandable:

Macros are non-primitive ⟨control sequences⟩ that are expandable. They and their arguments get replaced with their ⟨replacement text⟩ in TeX's gullet. Macros are defined in TeX's stomach in terms of the primitives \def, \edef, \gdef and \xdef. Be aware that LaTeX's \newcommand also is just a macro which "decomposes" in LaTeX's gullet to some set of tokens containing the \def-primitive.

There are non-primitive ⟨control sequences⟩ that are not expandable:

You can, e.g., do things like \let\bgroup={.

\bgroup is a ⟨control sequence⟩. More specific: It is a ⟨control sequence token. It is a ⟨control word token⟩. It is not expandable but it also does not have the meaning of one of the about 300 non-decomposable functions of TeX that are called primitives. \bgroup is an implicit character  token. ;-)

Other kinds of non-primitive ⟨control sequences⟩ that are not expandable are, e.g., \chardef-tokens, \countdef-tokens, \toksdef-tokens, ...


In order to get used to the terminology, let's unravel some cases:

After carrying out the assignments

\catcode`\W=13 %
\catcode`\X=13 %
\catcode`\Y=13 %
\catcode`\Z=13 %
\newcommand W{The active character \stringW is a macro.}
\newcommand\something{This is a macro.}
\newcommand\+{This is a macro, too.}
\let X=a
\let Y=\relax
\let Z=\romannumeral

,

  • W is a ⟨control sequence⟩: It is an ⟨active character token⟩. It is a macro and thus both expandable and not a primitive.
  • X is a ⟨control sequence⟩: It is an ⟨active character token⟩. It is not expandable. Thus it cannot be a macro. It also is not an unexpandable primitive. It has the same meaning as the catcode 11(letter) ⟨character token⟩ a. It is an implicit character token.
  • Y is a ⟨control sequence⟩: It is an ⟨active character token⟩. Its meaning equals the meaning of the unexpandable \relax-primitive. Thus it is an unexpandable primitive.
  • Z is a ⟨control sequence⟩: It is an ⟨active character token⟩. Its meaning equals the meaning of the expandable \romannumeral-primitive. Thus it is an expandable primitive.
  • \something is a ⟨control sequence⟩: It is a ⟨control sequence token⟩. It is a ⟨control word token⟩. It is expandable. Its meaning does not equal one of the about 300 non-decomposable functions of TeX that are called primitives. It is a macro.
  • \+ is a ⟨control sequence⟩: It is a ⟨control sequence token⟩. It is a ⟨control symbol token⟩. It is expandable. Its meaning does not equal one of the about 300 non-decomposable functions of TeX that are called primitives. It is a macro.

The term "macro" does always denote a non-primitive expandable ⟨control sequence⟩.
Thus when you know that a token is a macro, you can conclude that it cannot be a ⟨non-active character token because ⟨non-active character tokens cannot be ⟨control sequences⟩.
You can conclude that it is a ⟨control sequence⟩.
But you cannot conclude whether the ⟨control sequence⟩ is a ⟨control word token or a ⟨control symbol token or an ⟨active character token.

Not all expandable ⟨control sequences⟩ are macros.

There are expandable ⟨primitives⟩ as well.

Not all non-expandable ⟨control sequences⟩ are ⟨primitives⟩.

There are non-expandable non-primitives as well:

E.g., implicit character tokens, \chardef-tokens, \countdef-tokens, \toksdef-tokens, ...