History – LaTeX2e – \newcommand – Why optional arguments as (]-)delimited arguments

latex-baselatex-historymacrosoptional arguments

[Please take this question for a moot point/an "academical" issue.]

The LaTeX-kernel-command \newcommand was invented decades before expl3, xparse and \NewDocumentCommand were available.

\newcommand brings along the concept of an optional argument whereby tokens forming an optional argument need to be nested between [ and ].

With macros defined in terms of \newcommand that process optional arguments, only the presence of an opening square-bracket [ is detected as an indicator for the presence of an optional argument. The presence of a matching ] is not detected.

In short you can say:

With this concept, indicator for the presence of an optional argument is the presence of an opening square-bracket [. The optional argument itself is delimited by ].

Delimiting optional arguments (by ]) can be a source of problems, e.g., (]-)delimiter-matching when it comes to nesting optional arguments within optional arguments without being aware of the need of nesting the entire content of an optional argument between curly braces.

My question is:

Why was it decided to have optional arguments not only preceeded by a specific character-token ([₁₂) but also delimited by another specific character-token (]₁₂)?

Why optional arguments as delimited arguments?

Why optional arguments as ]₁₂-delimited arguments?

If delimiting was omitted, one could, instead of a preceding [, have chosen, e.g, a preceding ? for marking the presence of an optional argument.

Syntax of a macro \mymacro processing an optional argument would be:

\mymacro?{optional}{non-optional} respective \macro{non-optional}.

(There not being a delimiter implies there not being problems with missing delimiters or delimiter-matching in case of nesting things.)

Best Answer

As a historical note, the second optional argument to \newcommand was actually a later addition to LaTeX, first appearing in LaTeX2e. LaTeX 2.09's \newcommand did not have that facility. So optional arguments, in fact, predate that mechanism as well.

But the historical note is of more than just academic interest in this question: it actually goes a bit towards explaining the decision process here.

The kernel code for commands like \section, \caption and even \newcommand itself shows the original implementation of such features which involved multi-step processing of input spread across multiple macros (which is how we get \section calls \@startsection which calls \@sect or \@ssect depending on whether there's a *). Add in the fact that when LaTeX was first created, 16-bit addressing for TeX's main memory was the norm (meaning that there was really not that much space for macro definitions). “Big TeX” which changed that to 32-bit addressing was first introduced in the late 80s and I don't think it became normalized until the 90s at the earliest (I once tried to find the original Big TeX article in TUGboat but it's a difficult to Google subject).

So given all of that, and the challenges of programming in TeX's macro language, Lamport went for simplicity in implementation (thus [₁₂…]₁₂ and not, say, [₁₃…]₁₃ which also risked making non-optional-argument uses of brackets a challenge) as well as ergonomics for users (having the choice of delimiting arguments with […] or {…} makes for an easily observed distinction between the two which is not quite the case by using a prefix flag like ? not to mention that while * is uncommon in text and mathematics so it's unlikely that \somecommand* would be intended to mean execute \somecommand and then typeset * but not unlikely that someone might want to write, e.g., How long have you been using \LaTeX?).

As the sophistication of TeX programming has grown in the last 35 years, it's not unlikely that someone who were attempting to recreate LaTeX from scratch today could in fact allow, say, \section[$\sqrt[3]{x}$]{Cube roots} to be correctly parsed, but doing so would almost certainly break backwards compatibility.¹

Which makes a good case for the idea that the decision to extend LaTeX2e rather than release a potentially breaking LaTeX3 was not the best choice. Backwards compatibility can be a heavy burden to carry.

LaTeX's optional arguments viz TeX's macro arguments (delimited and undelimited)

The LaTeX concept of optional arguments (i.e., arguments that may or may not been used) is a concept that is not directly supported by TeX's parsing and execution. TeX macros always expect the same number of arguments with the same syntax for delimiting the argument.

Optional arguments in LaTeX are implemented by starting with a macro that has no arguments and first does a lookahead what the next token is (for example looking for a [ or a *) and then by calling internally a TeX macro that can parse exactly such a [ or * and expects it to be there.

For the TeX parser there are only two types of macro arguments: "undelimited" and "delimited" arguments. The normal case of "unlimited" arguments is defined like this:

\def\test#1#2#3{... do something with #1 #2 #3}

There can be up to 9 arguments (i.e., you can have #4...#9 in addition).

If you put anything before or after the argument specifiers #<digit> then we are in the situation of "delimited" arguments, e.g.,

\def\test*#1[#2]#3foo{... do something with #1 #2 #3}

Here \test is expected to be followed immediately by * the second argument is expected to be surrounded by [ and ] and the third argument has to be followed by the string foo. But there is nothing "optional" here, those components now are always required and if not there will generate a low-level TeX error, either "Use of \test doesn't match its definition" or "Runaway argument".

TeX's parsing rules for arguments

The braces in TeX (or more correctly the characters with catcode 1 and 2 that are normally open and close brace) play a special role when parsing arguments of macros. TeX keeps track of them when parsing arguments end ensures that they are balanced.

If you have a macro with undelimited arguments, e.g.,

\def\test#1{\def\result{#1}}

then TeX's parser does the following when executing \test:

if the first token after \test is not a character with catcode 1 (normally {) then this next token simply becomes the argument.
otherwise it scans further and only looks at catcodes until it sees an equal number of tokens with catcode 1 and 2, in other words a balanced set of brace groups.
it then strips off the outer set of tokens and the remaining material becomes the argument. In other words (with normal TeX catcodes in force) the braces surrounding an argument will not become part of the argument. However any further braces inside will remain.

Example:

\test{A}      \show\result
\test{{A}}   \show\result

this now gives

> \result=macro:
->A.
l.4 \test{A}      \show\result

? 
> \result=macro:
->{A}.
l.5 \test{{A}}   \show\result

i.e. the outer braces are gone.

The above is only done by TeX for tokens with catcode 1 and 2. If you use the concept of delimited arguments (with characters that have other catcodes) then no balancing happens. E.g.

\def\test[#1]{\def\result{#1}}

Now \test is a macro that expects to be followed by [ and its argument ends (normally) when the next ] is parsed. Inside the argument TeX still requires balanced tokens of catcode 1 and 2, but it doesn't care about the brackets: the first bracket will end the argument as long as there aren't any tokens of catcode 1 that have no matching token of catcode 2.

So here is what happens in this case when executing \test:

TeX expects a [ and if not will complain
Then it starts parsing the argument looking for the next ] on the same brace level (or more exactly on the same level over matching tokens with catcode 1 and 2).
The moment the ] is found anything between it will become the argument, the tokens that delimit the argument are thrown away.
However, the last statement is not quite right: what actually happens when the closing delimiter is seen is the following: TeX moves to the same state as when finding the end of the argument in the undelimited case. And that means that it now looks at the candidate argument and if it starts with catcode 1 token and ends in catcode 2 token it will strip both off.

Example using above definition:

\test[A]     \show\result
\test[{A}]   \show\result

Now this time we get:

> \result=macro:
->A.
l.10 \test[A]     \show\result

? 
> \result=macro:
->A.
l.11 \test[{A}]   \show\result

So in summary, brace groups (or rather groups of catcode 1 and 2 tokens) are the way to hide something from TeX's scanner that would otherwise be misinterpreted. And if you really want to have the optional argument being surounded by one set of braces (after it is being parsed) you would need to write [{{...}}].

Now you may wonder why LaTeX hasn't made [and ] also tokens of catcode 1 and 2 respectively. The answer to this is:

they would then not be usable in normal text
more importantly TeX scanner only looks at catcodes so you could then mix and match { with ] it would all be the same to it

Example:

\def\test#1{\def\result{#1}}

\catcode`\[=1
\catcode`\]=2

\test{[{A]}]     \show\result

This works and gives:

> \result=macro:
->[{A]}.
l.8 \test{[{A]}]     \show\result

Conclusion for the optional arguments of LaTeX2e

So in summary, one could claim that optional arguments in LaTeX are really correctly specified as [{ (open) and }] (close) and that [...] is just a convenient shortcut if you do not have to hide anything from the scanner which is true most of the time. There aren't any side effects from this (as long as the sequences are used --- of course if you use braces in the middle of the argument then they are not stripped and can produce side effects).

In reality, Leslie did it the other way around: he said [...] delimits the optional argument and in case you run into trouble use [{...}] instead.

Optional arguments in LaTeX3 via the `xparse` package

The xparse package that is implemented using expl3 has a more elaborate scanner and is able to deal with nested brackets inside an optional argument correctly, i.e., there is normally no need to hide it from the scanner.

Example:

\documentclass{article}
\usepackage{xparse}

\DeclareDocumentCommand\test{om}{\def\result{#1||#2}}

\test{A} \show\result
\test[B]{A} \show\result
\test[B and [C]]{A} \show\result

When executing this code we get the following results:

> \result=macro:
->-NoValue-||A.
l.6 \test{A} \show\result

? 
> \result=macro:
->B||A.
l.7 \test[B]{A} \show\result

? 
> \result=macro:
->B and [C]||A.
l.8 \test[B and [C]]{A} \show\result

Of course this helps only if the brackets are balanced, e.g., in a situation like \test[B] and C]]{A} there is no way for the scanner to figure out that you meant B] and C] being the content of the optional argument and not just B. So in such cases one would still need to use [{...}] to hide the unbalanced material.

Best Answer

Related Solutions

[Tex/LaTex] Why does TeX remove braces around delimited arguments

[Tex/LaTex] ] inside an optional argument

LaTeX's optional arguments viz TeX's macro arguments (delimited and undelimited)

TeX's parsing rules for arguments

Conclusion for the optional arguments of LaTeX2e

Optional arguments in LaTeX3 via the xparse package

Related Question

Optional arguments in LaTeX3 via the `xparse` package