[Tex/LaTex] What exactly do \csname and \endcsname do

macrostex-core

What exactly do \csname and \endcsname do? What are their job?
I have glanced at the Texbook and some other books, but none of them was clear enough to me.
Can anyone please give a simple example to clarify this issue?

Best Answer

Normally, control sequence names are made only of letters or of one non-letter character.

A letter is, more precisely, a character having category code 11 at the moment the control sequence name is read. So, any character can become part of a control sequence name, provided we change its catcode before the definition and each usage.

With \csname...\endcsname we are freed from this limitation and every character can go inside them to form a control sequence name (of course, % is excluded because it disappears together with what remains on the line before TeX is doing its work on characters).

However, this is not the main purpose of \csname...\endcsname. This construction is used to build commands from "variable parts". Think, for instance to LaTeX's \newcounter: after \newcounter{foo}, TeX knows \thefoo that is built precisely in this way. Roughly, what LaTeX does is

\newcommand{\newcounter}[1]{%
   \expandafter\newcount\csname c@#1\endcsname
   \expandafter\def\csname the#1\endcsname{\arabic{#1}}%
 }

so that \newcounter{foo} does the right job. It's more complicated than this, of course, but the main things are here; \newcount is the low-level command to allocate a counter. The \expandafter is just to build the control sequence before \newcount and \def see the token.

Inside \csname...\endcsname, category codes don't matter (with one main exception: active characters will be expanded if not preceded by \string, see final note). LaTeX exploits this in order to build control sequence names that users won't be able to access (easily). For example, the control sequence to choose the default ten point font is \OT1/cmr/m/n/10, which can be easily split internally (by the "reverse" operation that is \string) and is not available to the casual user.

Another important use is in environments: when you say \newenvironment{foo}, LaTeX really defines \foo and \endfoo. Upon finding \begin{foo}, LaTeX does some bookkeeping and then executes \csname foo\endcsname (that's why one can say also \newenvironment{foo*}); similarly, at \end{foo} LaTeX executes \csname endfoo\endcsname and after this it does some bookkeeping again.

Other uses: \label{foo} will define control sequences based on foo via \csname...\endcsname that can be used by \ref.

When one says \csname foo\endcsname, LaTeX will look whether \foo is defined; if not, it will execute \relax and from then on (respecting grouping), \foo will be interpreted as \relax. An interesting usage for this feature is that one can say

\chapter*{Introduction}
\csname phantomsection\endcsname
\addcontentsline{toc}{chapter}{Introduction}

and keep hyperref happy if it's loaded, while doing nothing if the package is not loaded.

It's possible to give many other interesting uses of this trick. But one should always keep in mind that TeX does complete expansion of what it finds in that context and that only characters must remain. So

\csname abc\relax def\endcsname

is forbidden. But, after \def\xyz{abc},

\csname \xyz def\endcsname

will be legal and equivalent to saying \csname abcdef\endcsname or \abcdef.

Final note

It's better to add something about category codes. An active character in \csname...\endcsname will be expanded, so to get a literal ~ one has to write \string~. Comment (category 14), ignored (category 9) and invalid (category 15) characters will remain such. So

\csname %\endcsname

will give an error (Missing \endcsname); in \csname ^^@\endcsname there will be no character and \csname ^^?\endcsname will raise an error.

Related Question