[Tex/LaTex] The \lowercase trick

tex-core

We have a number of questions or rather answers using (and explaining) the \lowercase “trick”.

\begingroup
\lccode`\~=`\.
\lowercase{\endgroup\def~}#1{foo with #1}

However, after reading on uppercase and lowercase codes I started wondering how it is possible that it is working. Quoting TeX by Topic:

To each of the character codes correspond an uppercase code and a
lowercase code […]. These can be assigned by

\uccode number equals number

and

\lccode number equals number .

In IniTeX codes `a..`z, `A..`Z have uppercase code
`A..`Z and lowercase code `a..`z. All other
character codes have both uppercase and lowercase code zero.

The commands \uppercase{...} and \lowercase{...} go through their argument
lists, replacing all character codes of explicit character tokens by their uppercase and lowercase code respectively if these are non-zero, without changing the category codes.

The last sentence made me thinking: how can the “trick” work if both ~ and the symbol that gets a definition have lowercase code=0? My naive expectation would be that

\lccode`\~=`\.

gives ~ the same lowercase code it had before and is equivalent to \lccode`~=0. But to me this would imply that the “trick” shouldn't work? How come it does work?

\documentclass{article}
\begin{document}

\verb+\the\lccode`. += \the\lccode`. \par
\verb+\the\lccode`~ += \the\lccode`~

\begingroup
\lccode`\~=`\.
\lowercase{\endgroup\def~}#1{foo with #1}

\catcode`\.=13

.{bar}

\end{document}

enter image description here

Best Answer

Your intuition is wrong. In TeX's syntax,

`\~

is a number, precisely the ASCII code of the character appearing after the backquote (it can also be a length one control sequence, meaning just the same, to accommodate cases like %). So the assignment

\lccode`\~=`\.

is just the same as

\lccode 126=46

so the lowercase counterpart of the tilde becomes the period, as far as \lowercase is concerned.

If you write

\lccode`\~=\lccode`\.

then you'd assign the tilde the lccode 0, but it's very different.


Some more words may be good for the beginner. One can think to \lccode as an array of length 256, with its index starting at 0. The command

\lccode <number1> = <number2>

stores <number2> in the slot numbered <number1>. When \lowercase does its job and finds a character with charcode x, it looks in slot x of the \lccode array; if the numbers stored there is 0, \lowercase does nothing; otherwise it finds a number y > 0 and replaces the charcode x with the charcode y (not changing the catcode).

As usual in TeX, the same primitive that performs the assignment can be used for retrieving the value in the slot. Thus \the\lccode`\A will print 97 (under normal settings) and

\count255=\lccode`A

would store 97 in the count register 255. One might even say

\count\lccode`A=42

in order to store 42 in count register 97. But this doesn't seem good usage of \lccode. ;-)

There are several other arrays besides \lccode and \uccode: \sfcode (space factor codes), \mathcode (math codes), \delcode (delimiter code) and \catcode (category code).

Related Question