When TeX parses input, it assigns each character read a category code. How TeX subsequently interprets the input then depends on both the character and it's category code. There are 16 category codes that can be set by the programmer, plus one special internal one. The 16 standard ones number from 0 upward. Category code 0 is for escape characters, usually \
. The rest are then (with typical examples):
- Begin group:
{
- End group:
}
- Math shift:
$
- Alignment:
&
- End-of-line
- Parameter for macros:
#
- Math superscript:
^
- Math subscript:
_
- Ignored entirely
- Space
- Letters: the alphabet.
- 'Other' character - everything else:
.
, 1
, :
, etc.
- Active character - to be interpreted as control sequences:
~
- Start-of-comment:
%
- Invalid-in-input:
[DEL]
Now when TeX reads input, each character is associated with a category code to generate tokens. So if the input reads
$ 1^{23}_a $
TeX reads:
- A math shift token, and goes into math mode
- A space, which is ignored in math mode
- An 'other' token
1
, which is simply typeset here
- A math superscript token, thus meaning that the next item will be superscripted
- A begin-group token,
- The 'other' tokens
2
and 3
, which cannot be typeset until the group finishes
- The close-group token
}
, which allows TeX to typeset the superscript
- A math subscript token, so moving the next item to a subscript position
- The letter
a
, which with no special meaning is typeset
- A space, again ignored
- A math shift token, and goes back into horizontal mode
Category codes often become important when TeX is deciding on what is and is not a control sequence. With only the alphabet as 'letters', something like
\hello@
is the control sequence \hello
followed by the 'other' token @
. On the other hand, if I make @
a letter
\catcode`\@=11\relax
\hello@
then TeX will look for a macro called \hello@
. This is commonly used in TeX code to isolate 'code' macros from 'user' ones. So you find programming macros such as \@for
. Without changing the category code, this is effectively 'hidden'. The idea of this is to 'protect the user from themselves': it's hard to break the code if you cannot even get at it!
There are many effects that can be achieved using category codes. An obvious one is the non-breaking space ~
used throughout the TeX world. This works because ~
has category code 13, and is therefore 'active'. When TeX reads ~
, it looks for a definition for ~
in the same way it would for a macro. That's a lot more convenient than using a macro for these cases.
We can use different category codes to make 'private' code areas. For example, plain TeX and LaTeX2e us @
as an extra 'letter', whereas LaTeX3 uses :
and _
. That effectively isolates internal LaTeX3 code from LaTeX2e, when the two are used together (as at present).
Verbatim material is another area where category codes are vital (if complex!). The reason you can't nest verbatim material inside anything else is that once TeX has assigned category codes it is only partially reversible. Anything which is 'ignored' or 'comment' is thrown away: you can't get it back. (With e-TeX, you can reassign category codes, but anything that is already gone stays 'lost.)
(Note for the interested) The 'special' category code is 16, which is used in the \ifcat
test, amongst other things. It is assigned to unexpandable control sequences in this situation, so that they do not match anything else other than other unexpandable control sequences.
\task
takes one argument, passing it to \@task
which is defined in such a way that its arguments are delimited; if the call is
\@task xyz:AB:cde:u\@nil
the first argument is xyz
, the second is AB
and the third is cde:u
. Here \@nil
doesn't mean anything, it's just required by the syntax of \@task
and TeX throws it away.
\relax
is a primitive of TeX, its function is "do nothing". The test
\if\relax\detokenize{#2}\relax
is a safe way to determine if the argument #2
is empty. If it is, \detokenize{#2}
expands to nothing, so \if
compares the tokens \relax
and \relax
, which are indeed equal, so the "true" branch is followed, which starts immediately after the second \relax
, up to and excluding \else
. If #2
is not empty, say it's 30
, \if
will compare \relax
with 3
which are different, so the "false" branch is followed, which starts after \else
up to and excluding \fi
.[1]
The similar construct \if\relax#2\relax
does not work in all cases, because "all control sequences are equal as far as \if is concerned" [2]. It would not work if #2
was \relax
(or any other control sequence, possibly followed by other tokens)! So we use \detokenize
that, as explained also by Joseph, splits everything into a string. So, even in the weird case that #2
is \relax
, \if
would compare the token \relax
with the character \
, which are different.
[1] This is not strictly true, but it's an approximation of the truth sufficient for the purpose of this description.
[2] TeX by Topic (section 13.2.1)
Best Answer
Three questions there, but I think you'll be let off!
'Balanced text' means that the argument has to have balanced grouping characters, usually
{
and}
pairs. This is because\detokenize
requires an argument starting with a token with category code 1 (begin-group), in the same way as a token register. Indeed, you can do very similar things with a token register and with\detokenize
:On the category codes in
\jobname
, there are a number of places where you get a 'string' from TeX where everything except spaces has category code 12. You see the same with\the\<somedimen>
and\meaning
(more on the later in a moment). You'd have to ask DEK for the full story, but my understanding is that this 'string' approach is used so that no tokens are accidentally added to a control sequence name. There are places where if they were 'letters' then trouble might arise.Finally, on the approach before e-TeX. As I said,
\jobname
is not the only place where you see 'string' output. In particular,\meaning
does the same. So if you dothe test will be true if the two names agree as lists of characters. There are variations on this method, see for example LaTeX's
\strip@prefix
, which can be used to make a 'string' without any prefix:(As pointed out by Martin Scharrer, LaTeX's
\@onelevel@sanitize
is the same as the above:\@onelevel@sanitize\testa
would be equivalent to the last line above. To show what is going on it's clearer to see the\meaning
but in use you'd pick\@onelevel@sanitize
.)