[Tex/LaTex] When are underscores allowed and when are they not allowed

underscore

I used to think that the behavior of underscores ('_') was as follows:

IF ('_') {
    BUFFER NEXT ITEM
    MAKE THE NEXT ITEM A SUBSCRIPT
    SEND SUBSCRIPT TO OUT STREAM
}
ELSE IF ('\_')
    SEND UNDERSCORE CHARACTER TO OUT STREAM

I was wrong.

%% Use underscore for subscript while not in math mode
%% ERROR!
%X_5

% Use underscore in math mode to make a subscript
% No error
$ X_{subscript} $

% Escape Sequence (backslash) tells LaTeX we want underscore char
% instead of '_' means 'make a subscript'
% No error
X \_ \_ \_ Oh look! underscore characters.

% underscore in the label name
% No error
\begin{equation}
    x = y \label{LAB_BY}
\end{equation}

%% Do not want subscript functionality of '_'
%% Use backslash to put an underscore into the label name
%% put a space after '_' so that after escape reads only '_' and not '_BY'
%% ERROR
%\begin{equation}
%    x = y \label{LAB\_ BY}
%\end{equation}

%% Do not want subscript functionality of '_'
%% Use backslash to put an underscore into the label name
%% fail to put a space char after '\_'
%% ERROR
%\begin{equation}
%    x = y \label{LAB\_BY}
%\end{equation}

When are underscores allowed and when are they not allowed?
What does '_' signify if you are not in math mode?

Best Answer

No special behaviour is assigned to any character in TeX, everything depends on the current catcode regime.

If \catcode95=11 (often written as \catcode`\_=11 then _ is a letter and you can use it anywhere you can use x so

\catcode`\_=11

a_b  \def\one_two_three{four}  \one_two_three

is all good and would typeset a_b four.

But normally _ has catcode 8 which means it has a subscript meaning if encountered in math mode, and an error if the character token would otherwise be typeset directly in text mode.

However other uses of the token, it is just a character token so for example

\newcommand\foo{a___jd_ \_ }

is legal and defines \foo to be that sequence of tokens (it may possibly generate an error if used, but not necessarily, depending on context).

Similarly in a \write or \csname (both constructs used by LaTeX's \label macro) any non active legal token just acts as itself so \csname one_two_three\endcsname constructs the control sequence with name one_two_three which is the same as the \one_two_three accessed above by use of catcode changes.

Note that \_ is just the control sequence with name _ it is not forced to produce an underscore. It does by default in latex, but just as \\ doesn't produce a backslash you could define \_ to do anything:

\def\_{zzzzz} \_

would produce zzzzz for example.

\_ is not predefined by TeX, laTeX defines it to be the macro:

\DeclareRobustCommand{\_}{%
   \ifmmode\nfss@text{\textunderscore}\else\textunderscore\fi}
Related Question