[Tex/LaTex] What are lccode and uccode used for

luatexpdftextex-corexetex

In TeX, each of the 256 bytes has an associated \lccode and an \uccode, integers in the range [0,255] which indicate among other things how \lowercase and \uppercase act. There are of course a bunch of other numbers (mathcode, and catcode for instance), but I am focusing here on case-changing codes.

A look at the TeXbook tells me about the following uses of the \lccode and \uccode:

\lowercase <general text> turns each character token in the argument into a character token with the same category code, but a character code equal to the \lccode of the original character code, unless the \lccode is zero, in which case, the original character code is retained.
\uppercase <general text> behaves in the same way, using the \uccode instead.
When hyphenating, TeX takes whatever characters reached its stomach (so either from tokens with category code 11 or 12, or from chardef'd tokens, or char), and defines a "letter" to be a character with non-zero \lccode. A letter is lowercase if its \lccode is equal to its character code.

Is this all? In particular, does TeX use the \uccode for any purpose other than the \uppercase primitive? What about other engines, pdfTeX, XeTeX, and LuaTeX?

Best Answer

The \lccode of a character is used in hyphenation when \uchyph is set to zero:

\documentclass{article}
\begin{document}

\uchyph=0 %

\begingroup
  \lccode`\C=`\C
  Some filler text. 
  Some filler text. 
  Some filler text. 
  Some filler text. 
  Capitalised word.
  \par
\endgroup

\begingroup
  \lccode`\C=`\c
  Some filler text. 
  Some filler text. 
  Some filler text. 
  Some filler text. 
  Capitalised word.
  \par
\endgroup

\begingroup
  \uccode`\C=`\C
  Some filler text. 
  Some filler text. 
  Some filler text. 
  Some filler text. 
  Capitalised word.
  \par
\endgroup

\begingroup
  \uccode`\C=`\c
  Some filler text. 
  Some filler text. 
  Some filler text. 
  Some filler text. 
  Capitalised word.
  \par
\endgroup

\end{document}

Notice that \uchyph is therefore misleadingly-named, as what is tested is whether the word starts with a lower case letter (one with \lccode equal to itself).

Related Solutions

[Tex/LaTex] The \lowercase trick

Your intuition is wrong. In TeX's syntax,

`\~

is a number, precisely the ASCII code of the character appearing after the backquote (it can also be a length one control sequence, meaning just the same, to accommodate cases like %). So the assignment

\lccode`\~=`\.

is just the same as

\lccode 126=46

so the lowercase counterpart of the tilde becomes the period, as far as \lowercase is concerned.

If you write

\lccode`\~=\lccode`\.

then you'd assign the tilde the lccode 0, but it's very different.

Some more words may be good for the beginner. One can think to \lccode as an array of length 256, with its index starting at 0. The command

\lccode <number1> = <number2>

stores <number2> in the slot numbered <number1>. When \lowercase does its job and finds a character with charcode x, it looks in slot x of the \lccode array; if the numbers stored there is 0, \lowercase does nothing; otherwise it finds a number y > 0 and replaces the charcode x with the charcode y (not changing the catcode).

As usual in TeX, the same primitive that performs the assignment can be used for retrieving the value in the slot. Thus \the\lccode`\A will print 97 (under normal settings) and

\count255=\lccode`A

would store 97 in the count register 255. One might even say

\count\lccode`A=42

in order to store 42 in count register 97. But this doesn't seem good usage of \lccode. ;-)

There are several other arrays besides \lccode and \uccode: \sfcode (space factor codes), \mathcode (math codes), \delcode (delimiter code) and \catcode (category code).

Cases of different tokens having same meaning and same \string-representation that can occur in the stage of expansion

Another such case are frozen font control sequences obtained by applying \the to a font command and the original font command itself. They fulfill all your criteria:

\documentclass{article}
\begin{document}
\makeatletter
% Let's assume that we loaded a font at some point:
\font\cmr cmr10
% Then we can get a second token for accessing the font using \the
\edef\tokens{\the\cmr\cmr}
% Compare their \string representations:
\edef\helpI{\expandafter\expandafter\expandafter\string\expandafter\@firstoftwo\tokens}
\edef\helpII{\expandafter\expandafter\expandafter\string\expandafter\@secondoftwo\tokens}
\ifx\helpI\helpII
  They have the same \texttt{\string\string} representation.
\else
  They have different \texttt{\string\string} representation.
\fi

\expandafter\ifx\tokens
  They have the same meaning.
\else
  They have different meaning.
\fi

% Now in case some people question that they are actually different, let's change the meaning of one of them and compare again.
\let\cmr\relax
\expandafter\ifx\tokens
  They are identical.
\else
  They are different tokens.
\fi
\makeatother
\end{document}

Best Answer

Related Solutions

[Tex/LaTex] The \lowercase trick

Cases of different tokens having same meaning and same \string-representation that can occur in the stage of expansion

Related Question