Since egreg has graciously agreed to not comment at least for a day to When not to use \ensuremath for math macro? :-), I thought I would take advantage of that and post a question about another (non-offical) campaign of egreg: additional %
at the end of \newcommand
and similar macros.
Background:
After starting out with LaTeX and defining numerous macros I ran into TeX Capacity exceeded problems. After considerable time, I was able to come up with an example small enough that reproduced the problem and posted it here:
That and the following provide a good explanation of what is going on:
- What is the use of percent signs (%) at the end of lines?
- Where are the necessary places to be appended with % to remove unwanted spaces?
So, I got into the habit of adding a %
at the end of every line within the preamble, even after \usepackage{}
and following the last }
of a \newcommand
, where is it strictly not necessary. My logic being that I did not see any harm and was easier to just add it rather than try to think about whether it was needed or not.
Problem:
However, a comment was made in Defining command containing pagebreak, and boxes:
Actually four of your % are redundant. … In some cases a % can even be wrong. 🙂 – egreg
Sometimes, I even added trailing %
within the document body, but noticed an answer was edited to remove the %
. For example this answer for Producing different versions of a document originally had a trailing %
after every line. Now, in this case this is not part of the preamble, but again I thought there was no harm in including them.
The only situation I am aware of where there is an issue with a trailing %
is mentioned at What is the use of percent signs (%) at the end of lines?:
\show\
\show\ %
Question:
I prefer solutions that require less thought, so I tend to include a trailing %
even though it may not be absolutely necessary. So, I would like to know when is it harmful to add a trailing %
in the definitions of \newcommand
and similar macros? Are there other cases besides the above \show
.
Here is the MWE: The two macros \mymacroA
and \mymacroB
are identical, except for the trailing %
:
\documentclass{article}
\newcommand\mymacroA[1]{
#1
}
\newcommand\mymacroB[1]{% <-- This percent is important
#1% <-- This percent is important
}% This does not appear to be necessary
\begin{document}
\mymacroA{foo}bar \mymacroA{foo} bar
\mymacroB{foo}bar \mymacroB{foo} bar
\end{document}
Best Answer
Tokenization stage
The general rule is that spaces after control words (
\par
, for instance) are ignored, while after control symbols (\!
, for instance) are retained; spaces at the beginning of a line are ignored altogether. Consecutive spaces are transformed into one space token but two consecutive end-of-lines become a\par
(this statement is not fully correct, but not too incorrect for the purposes of this answer).After the tokenization stage
There is an obvious intermix between tokenization stage and subsequent processing. In what follows "space" will mean "space token" and I won't care about spaces that have already disappeared, such as those after a control word).
It's now important to know that spaces in the input don't always produce spacing in the output. To understand why, it's necessary to learn some theory.
TeX is always in one of three modes: horizontal, vertical or math.
(The above paragraph tells a lie, strictly speaking: there is a circumstance in which it's not in one of these modes, but it's irrelevant for the discussion.)
Under normal circumstances spaces do not produce output in vertical and math mode (let's not discuss the very special settings that make them appear).
It's quite easy to tell when TeX enters math mode: as soon as it sees
$
or$$
(which, in LaTeX parlance, are, respectively\(
and all the display math environments, initiated by\[
,\begin{equation}
,\begin{align}
and so on). It exits math mode, returning to the previous mode when it sees the closing$
or$$
(with similar remarks as before for LaTeX).Roughly speaking, TeX is in vertical mode at the start of a job or after a
\par
or when it's beginning a\vbox
or\vtop
or\vcenter
; these are started, in LaTeX, by\parbox
and\begin{minipage}
. However, spaces in vertical and math mode are not suppressed: they are there, but produce no output. This is a cause for some misunderstandings. TeX starts horizontal mode when it sees a character to be typeset,\noindent
,\indent
or some other commands, notably\leavevmode
(this is not an exhaustive list) or when it's starting an\hbox
(for LaTeX it's\mbox
,\makebox
,\fbox
,\colorbox
, ... or thelrbox
environment). In horizontal mode every space that's not absorbed by other rules (see later) produces output.When TeX is absorbing the preamble of a LaTeX document it is in vertical mode. But a definition such as
will produce spaces in the output: the above is equivalent to
If
\foo
is seen at the beginning of a paragraph, the first space will not produce output (TeX is still in vertical mode), but the second will, as theb
will trigger horizontal mode. Converselywon't show spurious spaces. The first one, before
#1
does nothing because it's seen in vertical mode; the final one, after#1
is suppressed because of the final implicit\par
that ends the\parbox
.A definition such as
will not need
%
to protect end-of-lines (translated into spaces), because it will be used in math mode (or will give an error anyway).The most important (but easy) rule that should be considered is that tokens in the body of a definition are simply stored and not executed (they can be expanded in a
\edef
or\xdef
, but still not executed). The execution is performed (after expansion) when the defined macro is used. The execution of a space token in vertical or math mode does nothing, in horizontal mode it produces a spacing in the output.Numbers and dimensions
There is a place where space tokens have a peculiar behavior. When TeX is looking for a number or a dimension in order to perform an assignment or when expanding
\number
and\romannumeral
, it expands tokens until an unexpandable token appears or a space token is found. In this case, the space is swallowed as part of the process.This is one aspect to be kept in mind when writing macros. Let's see an example: we want to make a "monthly to-do list". Just a
\parbox
with twelve lines labelled by year and month; a loop seems the best approach, with the year given as argument:Try it; this will surprise you with
Oh, boy! We've made sure that no spurious spaces were inserted by our macro! Why is TeX betraying us? Simple: the input can be written equivalently as
and
#1
is replaced by 2013. So our loop checks whether the current value of\monthlycount
is less than 132013 and stores tokens for the\parbox
until TeX runs out of memory.Let's try an amended version:
Look, ma! No spurious spaces!
Exercise 1 Why some lines end with
%
and other lines don't? We now know that%
is harmful after13
(and also after the1
in the preceding line).Technicality: space tokens are looked for but then swallowed also after keywords (such as
by
in the example and the unit namesmm
,pt
and so on). There are also a few other places, but this answer is already too long. I'll show only how some of the problems can arise also when using LaTeX functions in this context:Exercise 2 Why isn't
%
necessary in lines 4–9 of the code above?