[Tex/LaTex] When is it harmful to add percent character at end of lines in a \newcommand, or similar

macros

Since egreg has graciously agreed to not comment at least for a day to When not to use \ensuremath for math macro? :-), I thought I would take advantage of that and post a question about another (non-offical) campaign of egreg: additional % at the end of \newcommand and similar macros.

Background:

After starting out with LaTeX and defining numerous macros I ran into TeX Capacity exceeded problems. After considerable time, I was able to come up with an example small enough that reproduced the problem and posted it here:

That and the following provide a good explanation of what is going on:

So, I got into the habit of adding a % at the end of every line within the preamble, even after \usepackage{} and following the last } of a \newcommand, where is it strictly not necessary. My logic being that I did not see any harm and was easier to just add it rather than try to think about whether it was needed or not.

Problem:

However, a comment was made in Defining command containing pagebreak, and boxes:

Actually four of your % are redundant. … In some cases a % can even be wrong. 🙂 – egreg

Sometimes, I even added trailing % within the document body, but noticed an answer was edited to remove the %. For example this answer for Producing different versions of a document originally had a trailing % after every line. Now, in this case this is not part of the preamble, but again I thought there was no harm in including them.

The only situation I am aware of where there is an issue with a trailing % is mentioned at What is the use of percent signs (%) at the end of lines?:

\show\ 
\show\ %

Question:

I prefer solutions that require less thought, so I tend to include a trailing % even though it may not be absolutely necessary. So, I would like to know when is it harmful to add a trailing % in the definitions of \newcommand and similar macros? Are there other cases besides the above \show.


Here is the MWE: The two macros \mymacroA and \mymacroB are identical, except for the trailing %:

\documentclass{article}
\newcommand\mymacroA[1]{
    #1
}
\newcommand\mymacroB[1]{% <-- This percent is important
    #1% <-- This percent is important
}% This does not appear to be necessary

\begin{document}
 \mymacroA{foo}bar \mymacroA{foo} bar

 \mymacroB{foo}bar \mymacroB{foo} bar
\end{document}

Best Answer

Tokenization stage

The general rule is that spaces after control words (\par, for instance) are ignored, while after control symbols (\!, for instance) are retained; spaces at the beginning of a line are ignored altogether. Consecutive spaces are transformed into one space token but two consecutive end-of-lines become a \par (this statement is not fully correct, but not too incorrect for the purposes of this answer).

After the tokenization stage

There is an obvious intermix between tokenization stage and subsequent processing. In what follows "space" will mean "space token" and I won't care about spaces that have already disappeared, such as those after a control word).

It's now important to know that spaces in the input don't always produce spacing in the output. To understand why, it's necessary to learn some theory.

TeX is always in one of three modes: horizontal, vertical or math.

(The above paragraph tells a lie, strictly speaking: there is a circumstance in which it's not in one of these modes, but it's irrelevant for the discussion.)

Under normal circumstances spaces do not produce output in vertical and math mode (let's not discuss the very special settings that make them appear).

It's quite easy to tell when TeX enters math mode: as soon as it sees $ or $$ (which, in LaTeX parlance, are, respectively \( and all the display math environments, initiated by \[, \begin{equation}, \begin{align} and so on). It exits math mode, returning to the previous mode when it sees the closing $ or $$ (with similar remarks as before for LaTeX).

Roughly speaking, TeX is in vertical mode at the start of a job or after a \par or when it's beginning a \vbox or \vtop or \vcenter; these are started, in LaTeX, by \parbox and \begin{minipage}. However, spaces in vertical and math mode are not suppressed: they are there, but produce no output. This is a cause for some misunderstandings. TeX starts horizontal mode when it sees a character to be typeset, \noindent, \indent or some other commands, notably \leavevmode (this is not an exhaustive list) or when it's starting an \hbox (for LaTeX it's \mbox, \makebox, \fbox, \colorbox, ... or the lrbox environment). In horizontal mode every space that's not absorbed by other rules (see later) produces output.

  • When TeX is absorbing the preamble of a LaTeX document it is in vertical mode. But a definition such as

    \newcommand{\foo}{
      bar
    }
    

    will produce spaces in the output: the above is equivalent to

    \newcommand{\foo}{ bar }
    

    If \foo is seen at the beginning of a paragraph, the first space will not produce output (TeX is still in vertical mode), but the second will, as the b will trigger horizontal mode. Conversely

    \newcommand{\baz}[1]{\parbox{10cm}{
      #1
    }}
    

    won't show spurious spaces. The first one, before #1 does nothing because it's seen in vertical mode; the final one, after #1 is suppressed because of the final implicit \par that ends the \parbox.

  • A definition such as

    \newcommand{\myop}{
      \overset{t }{=}
    }
    

    will not need % to protect end-of-lines (translated into spaces), because it will be used in math mode (or will give an error anyway).

The most important (but easy) rule that should be considered is that tokens in the body of a definition are simply stored and not executed (they can be expanded in a \edef or \xdef, but still not executed). The execution is performed (after expansion) when the defined macro is used. The execution of a space token in vertical or math mode does nothing, in horizontal mode it produces a spacing in the output.

Numbers and dimensions

There is a place where space tokens have a peculiar behavior. When TeX is looking for a number or a dimension in order to perform an assignment or when expanding \number and \romannumeral, it expands tokens until an unexpandable token appears or a space token is found. In this case, the space is swallowed as part of the process.

This is one aspect to be kept in mind when writing macros. Let's see an example: we want to make a "monthly to-do list". Just a \parbox with twelve lines labelled by year and month; a loop seems the best approach, with the year given as argument:

\documentclass{article}
\newcount\monthlycount
\newcommand{\monthlytodo}[1]{\par%
  \fbox{%
    \parbox{10cm}{%
      \monthlycount=1%
      \loop\ifnum\monthlycount<13%
        #1--\number\monthlycount\hrulefill\par%
        \advance\monthlycount by 1%
      \repeat%
    }%
  }%
}
\begin{document}
\monthlytodo{2013}
\end{document}

Try it; this will surprise you with

! TeX capacity exceeded, sorry [main memory size=3000000].

Oh, boy! We've made sure that no spurious spaces were inserted by our macro! Why is TeX betraying us? Simple: the input can be written equivalently as

\loop\ifnum\monthlycount<13#1--\number\monthlycount

and #1 is replaced by 2013. So our loop checks whether the current value of \monthlycount is less than 132013 and stores tokens for the \parbox until TeX runs out of memory.

Let's try an amended version:

\newcommand{\monthlytodo}[1]{\par
  \fbox{%
    \parbox{10cm}{
      \monthlycount=1
      \loop\ifnum\monthlycount<13
        #1--\number\monthlycount\hrulefill\par
        \advance\monthlycount by 1
      \repeat
    }%
  }%
}

Look, ma! No spurious spaces!

enter image description here

Exercise 1 Why some lines end with % and other lines don't? We now know that % is harmful after 13 (and also after the 1 in the preceding line).

Technicality: space tokens are looked for but then swallowed also after keywords (such as by in the example and the unit names mm, pt and so on). There are also a few other places, but this answer is already too long. I'll show only how some of the problems can arise also when using LaTeX functions in this context:

\newcounter{monthlycount}
\newcommand{\monthlytodo}[1]{\par
  \fbox{%
    \parbox{10cm}{
      \setcounter{monthlycount}{1}
      \loop\ifnum\value{monthlycount}<13
        #1--\arabic{monthlycount}\hrulefill\par
        \stepcounter{monthlycount}
      \repeat
    }%
  }%
}

Exercise 2 Why isn't % necessary in lines 4–9 of the code above?

Related Question