[Tex/LaTex] Towards a \ucshape and \textuc command for uppercase text (XeTeX, LuaTeX)

capitalizationfontspecluatexopentypexetex

While we have \textit and \textsc, \itshape and \scshape etc., there's no \textuc and \ucshape to typeset its argument in uppercase. There is \MakeUppercase, but — as uppercase is not a font style — it's not a font selection command and cannot be used like one. That's why it's of no use in, e.g., formatting headings. There's the seamus-egreg workaround, but that has its limitations, for example when exotic languages are used: \section{daß} produces »DAß« rather than »DASS«, etc.

Traditionally, the only way towards a proper \textuc/\ucshape seemed to be via creating a virtual font in which all lowercase letters are replaced by uppercase ones. While I've seen a couple of people suggesting that approach, I don't think anyone has ever done it, probably because it's tedious and inflexible, as it's always tied to one specific font.

My question is if it might be a good idea to rethink that issue now that we have Xe and Lua, where we can use OpenType Fonts and, via fontspec, OpenType feature files. In those feature files, we can define substitution rules that can be switched on and off. In OpenType, an f_i ligature, for example, is produced via such a substitution rule, saying »whenever in the code you come across an f followed by an i, replace the two by the glyph f_i in the output«. To tentatively answer my own question: yes, it might be. Consider the following example. [edit: now including a comparison with the \MakeTextUppercase command from the textcase package barbara mentioned.]

\documentclass{scrartcl}
\usepackage{fontspec,blindtext,microtype,filecontents,textcase}

\begin{filecontents*}{universalcaps.fea}
languagesystem DFLT dflt;
languagesystem latn dflt;

feature caps {

lookup ligatures {
  sub f_i by F I;
  sub f_l by F L;
  sub f_f_l by F F L;
  sub f_f_i by F F I;
  sub f_f by F F;
  sub f_j by F J;
  sub f_f_j by F F J;
  sub f_t by F T;
  sub f_f_t by F F T;
} ligatures;

lookup eszett {
  sub germandbls by S S;
} eszett;

lookup single {
  sub [a-z] by [A-Z];
  sub agrave by Agrave;
  sub aacute by Aacute;
  sub acircumflex by Acircumflex;
  sub atilde by Atilde;
  sub adieresis by Adieresis;
  sub aring by Aring;
  sub ccedilla by Ccedilla;
  sub egrave by Egrave;
  sub eacute by Eacute;
  sub ecircumflex by Ecircumflex;
  sub edieresis by Edieresis;
  sub igrave by Igrave;
  sub iacute by Iacute;
  sub icircumflex by Icircumflex;
  sub idieresis by Idieresis;
  sub eth by Eth;
  sub ntilde by Ntilde;
  sub ograve by Ograve;
  sub oacute by Oacute;
  sub ocircumflex by Ocircumflex;
  sub otilde by Otilde;
  sub odieresis by Odieresis;
  sub ugrave by Ugrave;
  sub uacute by Uacute;
  sub ucircumflex by Ucircumflex;
  sub udieresis by Udieresis;
  sub yacute by Yacute;
  sub thorn by Thorn;
  sub ydieresis by Ydieresis;
  sub oe by OE;
  sub ae by AE;
  sub scaron by Scaron;
  sub zcaron by Zcaron;
} single;

} caps;
\end{filecontents*}

\setmainfont[FeatureFile=universalcaps.fea]{TeX Gyre Termes}
\setsansfont[FeatureFile=universalcaps.fea]{TeX Gyre Heros}

\newcommand*{\ucshape}{\addfontfeature{RawFeature=+caps}}
\newcommand{\textuc}[1]{{\ucshape #1}}

\setkomafont{section}{\ucshape}

\begin{document}
\section{Lorem Ipsum}

\textuc{\blindtext
àéîàáâãäåæ çèéêëìíîï ðñòóôõö ùúûüýþÿœš
fi ff fl ffi ffl fj ffj}

\MakeTextUppercase{daß}
\textuc{daß}

\begin{titlepage}
\ucshape
\begin{center}
John Doe\par
{\huge Title}
\end{center}
\end{titlepage}

\end{document}

This will create a feature file, adding to whatever font is loaded a new feature called caps that can be turned on and off like any other. Note that this will work with TrueType fonts as well, {Georgia} etc; fontspec seems to transfer what's in the .fea file to even non-OpenType fonts.

  • Are there any potential drawbacks that I may not have had in mind? I'm still thinking it seems too simple to be true. (While it may be simple, it may of course still be a lot of work to take are of all lowercase glyphs in advanced multi-language fonts).

  • Has anyone else ever done or seen something like that? I find it hard to believe I should be the first to come up with that idea…

  • If noone has done it so far, would it be a good idea to turn it into a little package?

Best Answer

A full implementation in LuaTeX:

\documentclass{article}
\usepackage{fontspec}
\usepackage{xcolor}
\usepackage{luacode}
\usepackage{luatexbase}
\usepackage{luatexbase-attr}

\newluatexattribute\uppercaseattr


\begin{luacode*}
  local ucattr = luatexbase.attributes.uppercaseattr
  local GLYPH = node.id("glyph")
  local function makeuppercase(head)
      local orighead = head
      local string = unicode.utf8
      while head do
          if head.id == GLYPH then
              local att = node.has_attribute(head,ucattr)
              if att then
                  if head.char == 223 then -- ß
                      -- insert two 'S' glyphs
                      head.char = 83
                      orighead = node.insert_before(orighead,head,node.copy(head))
                  elseif head.char == 64258 then -- fl
                      head.char = 70 -- F
                      orighead = node.insert_before(orighead,head,node.copy(head))
                      head.char = 76 -- L
                  elseif head.char == 64256 then -- ff
                      head.char = 70 -- F
                      orighead = node.insert_before(orighead,head,node.copy(head))
                      head.char = 70 -- F
                  elseif head.char == 64257 then -- fi
                      head.char = 70 -- F
                      orighead = node.insert_before(orighead,head,node.copy(head))
                      head.char = 73 -- I
                  elseif head.char == 64259 then -- ffi
                      head.char = 70 -- F
                      orighead = node.insert_before(orighead,head,node.copy(head))
                      head.char = 70 -- F
                      orighead = node.insert_before(orighead,head,node.copy(head))
                      head.char = 73 -- I
                  elseif head.char == 64260 then -- ffl
                      head.char = 70 -- F
                      orighead = node.insert_before(orighead,head,node.copy(head))
                      head.char = 70 -- F
                      orighead = node.insert_before(orighead,head,node.copy(head))
                      head.char = 76 -- L
                  else
                      head.char = string.byte(string.upper(string.char(head.char)))
                  end
              end
          end
          head = head.next
      end
      return orighead
  end

  function makeuppercase_hbox(head,groupcode)
      local orighead = head
      if groupcode == "adjusted_hbox" or groupcode == "hbox" then
           makeuppercase(head)
      end
      return orighead
  end

  luatexbase.add_to_callback("hpack_filter",makeuppercase_hbox,"makeuppercasehbox")
  luatexbase.add_to_callback("pre_linebreak_filter",makeuppercase,"makeuppercase")
\end{luacode*}

\newcommand*\ucshape{\uppercaseattr=1}
\DeclareTextFontCommand{\textuc}{\ucshape}

\begin{document}
\hsize 7.2cm
\newcommand\sample{Draußen \i \ij\ ffl fluffiest fish \textit{König} \textcolor{blue}{àéîàáâãäåæ} çèéêëìíîï  ff \hbox{ðñòóôõö} ùúûüýþÿœš}


Lowercase {\ucshape \sample} Lowercase

Lowercase \textuc{\sample} Lowercase

% textcolor doesn't work in LaTeX's \MakeUppercase
% Lowercase \MakeUppercase{\sample} Lowercase


\end{document}

which yields

enter image description here

The program above just manipulates the nodelist iff the given attribute is set (to any value). Attributes are grouped, just like any TeX assignment.

It's a bit more complicated than I thought because we need to treat \hbox{}es separately.