[Tex/LaTex] (La)TeX — What does the ‘%’ character do

syntaxtex-core

the % character seems to have a lot of uses. So far, I have encountered it in the following places:

For writing comments

% Schrodinger wave equation
$$\nabla^2\psi + \frac{8\pi^2m}{h^2}(E-V) = 0$$

(I may be incorrect here) For splitting commands over multiple lines (reference here)

\newcounter{bull}
\newcommand{\showbullcntr}[1]{%
\setcounter{bull}{#1}%
\bullcntr{bull}%
}

In conjunction with minipage, to remove the leading space inserted (adapted from this question)

\noindent
\begin{minipage}{0.5\linewidth}
   ...
\end{minipage}%
\begin{minipage}{0.5\linewidth}
   ...
\end{minipage}

Could someone please explain, in detail, the various uses of the % character in LaTeX? In particular, why is it used after the first instance of \end{minipage}?

Best Answer

The three use cases for the % character you've listed can be unified in a single, over-arching use case: To exclude everything that's on the remainder of the current input line from further processing.

This everything comprises not only comments (the first use case you mentioned) but also the invisible end-of-line character at the end of the input line. In fact, after encountering a % character, TeX will also ignore any whitespace that may be present at the start of the next line. Actually, whitespace at the start of an input line is always ignored, whether or not the preceding line was terminated with %. Well, verbatim mode is an exception, but that's a topic for a different discussion.

Thus,

Hello %
World

and

Hello % 
    World

end up being processed the same way: TeX reads Hello from the first line (with the space after "Hello" included) and World from the second line and will output "Hello World". Observe that the end-of-line character at the end of the first line and the four whitespace characters at the start of the second line have been discarded.

What would happen if the input were

Hello% 
    World

The output will now be "HelloWorld". Why? Well, as before, the end-of-line character at the end of the first line and the four whitespace characters at the start of the second line get discarded, and what's left over is "Hello" from the first line and "World" from the second.

One more piece of information: A single end-of-line character gets converted during TeX's first processing phase to whitespace. (In contrast, two or more consecutive end-of-line characters get converted to a \par token.) Thus,

Hello 
World

and

Hello World

create the same output.

Equipped with these pieces of information, we are able to figure out what the purpose of the % character in the following code chunk is:

\noindent
\begin{minipage}{0.5\textwidth}
...
\end{minipage}%
\begin{minipage}{0.5\textwidth}
...
\end{minipage}

The total width of this construct is 0.5\textwidth+0.5\textwidth=1\textwidth. Had the % character been omitted, the end-of-line character after the first instance of \end{minipage} would have been converted to whitespace and the total width would have been 1\textwidth plus the width of the space character. That is almost certainly not what is wanted. Observe also that it was important to place the % character immediately after \end{minipage}; writing \end{minipage} % would not serve the intended purpose.

Next, consider the following two macro definitions:

\newcommand\cmdA[1]{
        #1}
\newcommand\cmdB[1]{%
        #1}

Both definitions are legal. In particular, it is not strictly necessary, from a purely syntactic point of view, to provide a % character after \newcommand\cmdA[1]{ in order to get the macro to compile. But this doesn't mean that the two macros produce the same output. Can you guess what \cmdA{abc}\cmdA{abc} and \cmdB{abc}\cmdB{abc} will output? For simplicity, please assume that both command sequences occur at the start of an input line.

Sure enough, \cmdB{abc}\cmdB{abc} outputs "abcabc", without whitespace between the "abc" sub-strings.

In contrast, \cmdA{abc}\cmdA{abc} outputs "abc abc". Why? The two instances of \cmdA{abc} each output abc; the whitespace before abc is there because the single end-of-line character after \newcommand\cmdA[1]{ is converted to whitespace. The whitespace contributed by the first instance of \cmdA{abc} is ignored by TeX since it occurs, by assumption, at the start of a line; however, the whitespace contributed by the second instance of \cmdA{abc} is not. That's how you end up with abc abc.

Addendum, prompted by a comment by @PhelypeOleinik: Some readers may gotten the impression from absorbing the preceding discussion that whereas failing to terminate some lines located within, say, a macro definition, with % can be a mistake, surely terminating every line within that macro definition with % cannot hurt. Unfortunately, that would also be a mistake.

Consider the following, admittedly somewhat contrived example. It features 5 instances of %, only one of which matters:

\ifnum1=0%
   1%
   a%
\else%
   b%
\fi

Here, \ifnum<u>=<v> is a conditional that checks whether <u> and <v> are numerically equal. (If either <u> or <v> is non-numeric, an error message is issued. And, just to fix ideas: 1 and 01 are numerically equal.)

What would you guess (La)TeX will output: 1, 1a, a, or b? If you guessed b -- after all, 1 is definitely not equal to 0! -- you would have guessed wrong. The correct answer is a. Why?

The presence of the % character immediately after \ifnum1=0 gobbles up the invisible end-of-line character; thus, TeX keeps scanning ahead until it encounters the first non-numeric character in order to set up the conditional test. Hence, the condition that actually gets evaluated by TeX is \ifnum1=01 -- remember: whitespace at the start of a line is ignored -- which is true. That's why a gets typeset.

If, however, one had written either

\ifnum1=0
   1%
   a%
\else%
   b%
\fi

or, better still,

\ifnum1=0
   1
   a
\else
   b
\fi

the correct answer would be b, since TeX would "see" the end-of-line character after \ifnum1=0, convert it to a space character, realize that it should start evaluating whether \ifnum1=0 is true, and select the appropriate branch of the conditional "tree" structure.

Related Solutions

[Tex/LaTex] Logical “and” character in TeX (⋀)

$((a\implies b) \land (c\implies a)) \implies (c \implies b)$

\land and \wedge are synonymous.

[Tex/LaTex] What does the \the\everypar do

\everypar is a token list register, so its value is assigned by

\everypar={<tokens>}

Its default value in Plain TeX is empty.

What's for? Its contents is delivered into the token list TeX is reading when it switches from vertical to horizontal mode. This happens when TeX, in vertical mode, sees a horizontal command, for instance a character to be typeset or \indent or \noindent.

At that time the horizontal command is set aside; before examining it again, TeX contributes \parskip glue to the vertical list and goes into horizontal mode; now it delivers the contents of \everypar as if \the\everypar was implicit and reexamines the horizontal command that forced the switch to horizontal mode. However the switching to horizontal mode caused by a horizontal command other than \noindent causes TeX to insert an empty box of width \parindent before the contents of \everypar.

One can use \everypar to perform automatically some job at paragraph start, for example to number them.

LaTeX exploits heavily this mechanism, so it's not recommended to play with it when the job involves lists. An example is the declaration

\everypar{\@nodocument}

that is in force until \begin{document} is processed; \@nodocument issues the error message Missing \begin{document} in order to warn the user that text starting a paragraph has been found in the preamble.

As all token registers, its value can be augmented by

\everypar=\expandafter{\the\everypar<tokens>}

since \the\everypar will deliver the current register's contents.

Best Answer

Related Solutions

[Tex/LaTex] Logical “and” character in TeX (⋀)

[Tex/LaTex] What does the \the\everypar do

Related Question