[Tex/LaTex] How are space tokens and empty lines processed by long commands (as their potential arguments)

line-breakingmacrosspacingtex-coretoken-lists

How are spaces and empty lines processed by long commands (i.e., those that do not accept paragraph breaks inside)? Are there different space tokens aside from " " and an empty line? It appears that an empty line counts as exactly one empty argument:

\documentclass{article}

\newcommand{\oneArg}[1]{}
\newcommand{\twoArgs}[2]{}
\newcommand{\threeArgs}[3]{}

\begin{document}

\indent
A \oneArg

B

% output:
% A B


A \twoArgs

B

C

% output:
% A
% C

A \threeArgs

B

C

% output:
% A C

\end{document}

And is there anything special one needs to know about math mode in this regard?

One pointer: Some relevant information is in this question, especially this discussion thread about \somecommand * being legal LaTeX.


Addendum: An interesting detail about short macros (those defined for example with \newcommand*): If I add \newcommand*{\noPar}[1]{#1} to my source code and try to compile an additional codechunk

\noPar{
A \threeArgs

B

C
}

the compiler will throw an error. As this is semantically not a paragraph break, the long-short distinction between commands should probably be described in terms of empty lines, not paragraph breaks. Or not?

Best Answer

The answer to the question in the title is technically "they are not processed" but I don't think that's the answer you want.

If you modify your definitions to

\newcommand{\oneArg}[1]{\long\def\a{[#1]}\typeout{\meaning\a}}
\newcommand{\twoArgs}[2]{\long\def\a{[#1][#2]}\typeout{\meaning\a}}
\newcommand{\threeArgs}[3]{\long\def\a{[#1][#2][#3]}\typeout{\meaning\a}}

you will see

\long macro:->[\par ]
\long macro:->[\par ][B]
\long macro:->[\par ][B][\par ]

Assuming normal catcodes are in force a blank line is turned by TeX into the token \par (literally the command name token \par not the primitive paragraph end function) It does this at a very early stage as characters are being tokenised, so before any token lists are passed to a macro. So a macro never sees a blank line in its argument. The behaviour is always as if you replace the blank line by \par in the input file.

Space tokens are similarly processed at this early stage. Spaces at the end of the line and the beginning of the next are discarded and never tokenised at all so macros have no record of them. (You can not prevent the discarding of space at the end of the line even if you change the catcode of space) and runs of spaces characters only produce one space token. It is the tokens not the file characters that are passed to a macro.

If you have non-delimited arguments as in your example any spaces tokens are skipped while looking for the argument, if you want a space to be the argument you need { }. \par can be an undelimited argument if the macro is \long or an error otherwise.