Disclaimer: I am aware that one should not use special characters in macro names and do not recommend doing that (on the contrary). I ask this question purely out of curiosity.
Until recently I thought that only "ordinary" characters could be used in macro names, i.e. letters (a
–z
, A
–Z
) and common symbols like digits (0
–9
) or punktuation (e.g. -
, !
). Following a question on this site I discovered that this in not true: Even accented Letters are alowed (and can be directly input when also using inputenc
):
\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\makeatletter
\begin{document}
\expandafter\def\csname \c c\v c\'e\endcsname{I am weird}
\csname \c c\v c\'e\endcsname:
\expandafter\string\csname \c c\v c\'e\endcsname
% with inputenc:
\def\äöü{Me too}
\äöü:
% \string\äöü does not work
\expandafter\string\csname äöü\endcsname
\end{document}
However, some characters, like \v o
or \ss
, give me errors. Input of ß
(with inputenc
) on the other hand works just fine. Surprisingly, this did not work at all withouth the use of fontenc
.
- Can you give a precise rule for which characters are admissible in macro names?
- Why is there a difference between writing
\ss
and writingß
? - Why is
\v c
legal but\v o
not? - Why does
\def\äöü
work but\string\äöü
not? - Why does this only work when using
\usepackage[T1]{fontenc}
?
Best Answer
Absolutely all bytes 0 to 255 are admissible in macro names. But how convenient they are to type, and how they correspond to characters in the human-visible sense, can depend, among other things, on the catcodes and on the definitions of active characters, which in turn can depend on the packages currently loaded (the input encoding and the font encoding).
The precise rule is that a macro is either:
A single active character: a token with 13 as the category code, and any number 0–255 as the character code.
A control word: an escape character (
\
) followed by a sequence of letters (tokens with 11 as the category code, and any number 0–255 as the character code).A control symbol: an escape character (
\
) followed by a single non-letter (token with anything other than 11 as the category code, and any number 0–255 as the character code).Before answering the rest of your questions, some explanation.
Like most software systems, TeX (specifically, non-Unicode TeX, i.e. Knuth TeX or pdfTeX, as opposed to XeTeX or LuaTeX) understands only bytes (0 to 255); it doesn't understand “characters” as such. (And like most pre-Unicode systems, its terminology uses “bytes” and “characters” sometimes misleadingly.) To give the illusion of “understanding” bytes as characters, there are two “translations” that happen:
Font encoding: this says where the shapes (glyphs) for certain (what we think of as) characters are “supposed” to be in a font: e.g. under the default (OT1) encoding (and also under the T1 encoding), position 65 (octal
'101
, hexadecimal"41
) is supposed to contain something that looks like an “A”. And position 231 (hexadecimal"E7
) is supposed to contain a glyph for the “ç” in the T1 encoding, and not supposed to contain anything in the default (OT1) encoding. Correspondingly, thefontenc
package redefines the meanings of\c
etc as appropriate.Input encoding: With
\usepackage[utf8]{inputenc}
, this sets up certain characters (bytes) as active, so that UTF-8 sequences of bytes can be interpreted as the corresponding Unicode character.Also: TeX has a way of directly inputting a specific byte in the input file, by
^^
followed by two hex digits (0123456789abcdef
), e.g. anywhere you can type 'A' (in text, in a macro name, whatever), you can also type^^41
, etc. Let's use that for clarity.With that understanding, the two examples in the question are:
\csname \c c\v c\'e\endcsname
— here, with\usepackage[T1]{fontenc}
, the definitions of\c
,\v
and\'
are such that\c c
expands to a token with category code 11 and character code 231 (hexe7
),\v c
expands to a token with category code 11 and character code 163 (hexa3
),\' e
expands to a token with category code 11 and character code 233 (hexe9
).So the following are equivalent:
and
and simply
This is a macro of the “control word” type: a backslash followed by a sequence of three letters.
Here,
äöü
in the input file is (assuming you've saved the file in the UTF-8 encoding) the sequence of bytes C3 A4 C3 B6 C3 BC. Further,\usepackage[utf8]{inputenc}
changes the catcodes of all these bytes to active. So the following two are equivalent:and
This is a macro of the “control symbol” type: what it has actually defined is
\^^c3
(a single nonletter), with the requirement that when used it's supposed to be followed by the tokens^^a4^^c3^^b6^^c3^^bc
all of catcode 13. (Else you'll get something likeUse of \^^c3 does not match its definition
.)Now to answer the rest of your questions:
\v c
expands to the token with category code 11 (letter) and character code 163 (hex"A3
). This you can see is the characterč
in T1.\v o
does not expand to a single character token (there is ač
but noǒ
in the T1 encoding), but to instructions to add an appropriate accent to theo
character. Inside\csname ... \endcsname
, everything should expand to just character tokens.There's not much of a difference really; just that you (I guess) tried the former inside
\csname … \endcsname
, and the latter directly after\def
.Unlike the earlier case where (for example)
\c c
expands to a single token with category code 11 and character code 231,\ss
expands to\char"FF
— that is, the TeX primitive command\char
, followed by (if\char
is being processed) the number"FF
. (This is different from the token^^ff
, though whyfontenc
doesn't define\ss
to expand to a single character token I don't know.) This too is not allowed inside\csname … \endcsname
.ß
too expands to something similar (you can't use it inside\csname … \endcsname
either), but if you're using it after\def
directly, then without expansion it's a sequence of two active characters^^c3^^9f
, and\def
doesn't expand the tokens.See above for why
\def\äöü
works: it's\def\^^c3^^a4^^c3^^b6^^c3^^bc
.And
\string\äöü
is\string\^^c3^^a4^^c3^^b6^^c3^^bc
which is\string\^^c3
(which works: try it) followed by^^a4^^c3^^b6^^c3^^bc
(and the first byte there, the second byte of the UTF-8 representation ofä
, has been defined as an active character that throws an error, because it should never appear on its own in valid UTF-8).The definition of the control symbol, as in
\def\äöü{Me too}
, will work with or without\usepackage[T1]{fontenc}
, so will its usage. But if you want to use these “special” characters inside\csname ... \endcsname
, then you need their definitions to be things that expand to just character tokens (which\usepackage[T1]{fontenc}
does, because it can: those characters exist in the font), rather than expand to instructions for placing accents above/below other characters (which is what happens without\usepackage[T1]{fontenc}
, as there's no alternative).