[Tex/LaTex] ] inside an optional argument

groupingoptional argumentssyntax

I accidentally discovered (here) that a closing square bracket within an optional argument delimited by [ ] can cause problems.

Here is example code illustrating the issue:

\documentclass{article}


\begin{document}

\tableofcontents

\section[Square brackets ([ and ])]{Square brackets: [ and ]}
% bad: the TOC now has "Square brackets ([ and", and the actual title is a mere ")".

Some text.

\section[{Square brackets ([ and ])}]{Square brackets: [ and ]}
% good (works as originally intended)

Some text.

\end{document}

The text to be used as the optional (as well as the non-optional) argument contains a ], and this character is not supposed to function as the optional argument terminator but as a matching square bracket/parenthesis. Perhaps this is as it should be, but the question is how to best circumvent any potential problems. Enclosing this in a group works, but I don't know whether doing this has other side effects (perhaps not here in the case of \section but hopefully someone else has a better example of where grouping could lead to a problem). There is no (obvious) way other than writing [{ }] because [ and ] are not normally escaped. That [ and ] are normal characters and bear syntactic function (as optional argument delimiters) creates this potentially problematic situation.

Is simply enclosing the entire optional argument in a group ({ }) a good solution? Are there other/better ways?

Best Answer

LaTeX's optional arguments viz TeX's macro arguments (delimited and undelimited)

The LaTeX concept of optional arguments (i.e., arguments that may or may not been used) is a concept that is not directly supported by TeX's parsing and execution. TeX macros always expect the same number of arguments with the same syntax for delimiting the argument.

Optional arguments in LaTeX are implemented by starting with a macro that has no arguments and first does a lookahead what the next token is (for example looking for a [ or a *) and then by calling internally a TeX macro that can parse exactly such a [ or * and expects it to be there.

For the TeX parser there are only two types of macro arguments: "undelimited" and "delimited" arguments. The normal case of "unlimited" arguments is defined like this:

\def\test#1#2#3{... do something with #1 #2 #3}

There can be up to 9 arguments (i.e., you can have #4...#9 in addition).

If you put anything before or after the argument specifiers #<digit> then we are in the situation of "delimited" arguments, e.g.,

\def\test*#1[#2]#3foo{... do something with #1 #2 #3}

Here \test is expected to be followed immediately by * the second argument is expected to be surrounded by [ and ] and the third argument has to be followed by the string foo. But there is nothing "optional" here, those components now are always required and if not there will generate a low-level TeX error, either "Use of \test doesn't match its definition" or "Runaway argument".

TeX's parsing rules for arguments

The braces in TeX (or more correctly the characters with catcode 1 and 2 that are normally open and close brace) play a special role when parsing arguments of macros. TeX keeps track of them when parsing arguments end ensures that they are balanced.

If you have a macro with undelimited arguments, e.g.,

\def\test#1{\def\result{#1}}

then TeX's parser does the following when executing \test:

  • if the first token after \test is not a character with catcode 1 (normally {) then this next token simply becomes the argument.
  • otherwise it scans further and only looks at catcodes until it sees an equal number of tokens with catcode 1 and 2, in other words a balanced set of brace groups.
  • it then strips off the outer set of tokens and the remaining material becomes the argument. In other words (with normal TeX catcodes in force) the braces surrounding an argument will not become part of the argument. However any further braces inside will remain.

Example:

\test{A}      \show\result
\test{{A}}   \show\result

this now gives

> \result=macro:
->A.
l.4 \test{A}      \show\result

? 
> \result=macro:
->{A}.
l.5 \test{{A}}   \show\result

i.e. the outer braces are gone.

The above is only done by TeX for tokens with catcode 1 and 2. If you use the concept of delimited arguments (with characters that have other catcodes) then no balancing happens. E.g.

\def\test[#1]{\def\result{#1}}

Now \test is a macro that expects to be followed by [ and its argument ends (normally) when the next ] is parsed. Inside the argument TeX still requires balanced tokens of catcode 1 and 2, but it doesn't care about the brackets: the first bracket will end the argument as long as there aren't any tokens of catcode 1 that have no matching token of catcode 2.

So here is what happens in this case when executing \test:

  • TeX expects a [ and if not will complain
  • Then it starts parsing the argument looking for the next ] on the same brace level (or more exactly on the same level over matching tokens with catcode 1 and 2).
  • The moment the ] is found anything between it will become the argument, the tokens that delimit the argument are thrown away.
  • However, the last statement is not quite right: what actually happens when the closing delimiter is seen is the following: TeX moves to the same state as when finding the end of the argument in the undelimited case. And that means that it now looks at the candidate argument and if it starts with catcode 1 token and ends in catcode 2 token it will strip both off.

Example using above definition:

\test[A]     \show\result
\test[{A}]   \show\result

Now this time we get:

> \result=macro:
->A.
l.10 \test[A]     \show\result

? 
> \result=macro:
->A.
l.11 \test[{A}]   \show\result

So in summary, brace groups (or rather groups of catcode 1 and 2 tokens) are the way to hide something from TeX's scanner that would otherwise be misinterpreted. And if you really want to have the optional argument being surounded by one set of braces (after it is being parsed) you would need to write [{{...}}].

Now you may wonder why LaTeX hasn't made [and ] also tokens of catcode 1 and 2 respectively. The answer to this is:

  • they would then not be usable in normal text
  • more importantly TeX scanner only looks at catcodes so you could then mix and match { with ] it would all be the same to it

Example:

\def\test#1{\def\result{#1}}

\catcode`\[=1
\catcode`\]=2

\test{[{A]}]     \show\result

This works and gives:

> \result=macro:
->[{A]}.
l.8 \test{[{A]}]     \show\result

Conclusion for the optional arguments of LaTeX2e

So in summary, one could claim that optional arguments in LaTeX are really correctly specified as [{ (open) and }] (close) and that [...] is just a convenient shortcut if you do not have to hide anything from the scanner which is true most of the time. There aren't any side effects from this (as long as the sequences are used --- of course if you use braces in the middle of the argument then they are not stripped and can produce side effects).

In reality, Leslie did it the other way around: he said [...] delimits the optional argument and in case you run into trouble use [{...}] instead.

Optional arguments in LaTeX3 via the xparse package

The xparse package that is implemented using expl3 has a more elaborate scanner and is able to deal with nested brackets inside an optional argument correctly, i.e., there is normally no need to hide it from the scanner.

Example:

\documentclass{article}
\usepackage{xparse}

\DeclareDocumentCommand\test{om}{\def\result{#1||#2}}

\test{A} \show\result
\test[B]{A} \show\result
\test[B and [C]]{A} \show\result

When executing this code we get the following results:

> \result=macro:
->-NoValue-||A.
l.6 \test{A} \show\result

? 
> \result=macro:
->B||A.
l.7 \test[B]{A} \show\result

? 
> \result=macro:
->B and [C]||A.
l.8 \test[B and [C]]{A} \show\result

Of course this helps only if the brackets are balanced, e.g., in a situation like \test[B] and C]]{A} there is no way for the scanner to figure out that you meant B] and C] being the content of the optional argument and not just B. So in such cases one would still need to use [{...}] to hide the unbalanced material.