[Tex/LaTex] How are the glyph (character) names in PDF-files determined

fontspdfpdftexsymbols

PDF-files make internal use of glyph names. For example, the name of ≈ (U+2248; TeX \approx) appearing in a PDF-file might be approxequal.

One can find such names in a TeX-generated PDF-file by

compiling the TeX code with \pdfcompresslevel=0,
inspecting the resulting PDF-file as a text file, and
looking for lines starting with /CharSet.

(information taken from Ulrike Fischer's answer elsewhere, which provides more information).

Apparently the glyph names are font-dependent. So they are determined by the fonts? Do all font formats use such names? Which font formats use textual names? Do all glyphs in all PDF-files have such names?

How are the glyph names in PDF-files determined? Who determined the existing ones? What are they for? (Why doesn't PDF refer to the glyphs by number? Clearly some readers are relying on the glyph names (see link to question about hyperlink detection below), so the PDF format or some readers make some assumptions about these names. There must be a reason about why an intermediary of names is used. Perhaps this has to do with the age of Unicode in relation to PDF.) What else is there to know on this topic for a user of (La)TeX?

For me, the issue of PDF glyph names came up here:

Manipulating the Unicode codepoints of glyphs in the resulting PDF-file requires knowledge of the glyph names. Notably, glyphtounicode.tex maps from glyph names to Unicode codepoints, with lines such as \pdfglyphtounicode{approximatelyequal}{2245}: How to fix missing or incorrect mappings from glyphtounicode.tex
At least one PDF reader uses glyph names for a heuristic for HTTP URL detection: \input{glyphtounicode} with \pdfgentounicode=1 creates unwanted hyperlinks from link-like text

A similar question is How to find the proper glyph name required by \pdfglyphtounicode, but there is more ground that needs to be covered in this topic.

Best Answer

it's my understanding that the glyph names are determined by the font. (note use of the term "glyph"; characters and glyphs are related, but are not interchangeable. but that's another story.)

it's also my understanding that the names supplied by the font depend on the supplier of the font -- they may be "meaningful" in some way (e.g., an ascii letter, a unicode, a descriptive name, ...) or they may just be a supplier's internal code, as used to be the situation in the days of metal type (as shown in old monotype technical symbols listings).

things may change, but ... don't hold your breath.

adding to what ulrike has said, unicode also uses names as well as numbers. an important (but possibly irrelevant point) here is that, once both a name and a number are assigned, they are never changed, even should the name prove to be wrong, or just ill-advised.

a second point is that some glyphs are not necessarily named by a single unique unicode. a unicode is supposed to define meaning, not shape. "variant" glyphs (with the same meaning but different shape) may be represented by multiple unicodes, in two principal ways:

by using a combining diacritic, as \nvarleq is a compound of \leq (U+2264) and U+20D2, "combining long vertical overlay"; almost no relations negated by a vertical cancellation are represented by single unicodes, and unless the basic principles of unicode assignment change, this will remain the norm.
by adding a defined "variation selector" (U+FE00) to designate recognized (i.e., officially by unicode) variants that are unable to be modified by addition of a combining diacritic, such as \lvertneqq (less than but not equal to with vertical negation of only the equals sign, U+2268,U+FE00).

unicode technical report #25, unicode support for mathematics, deals with these methods in sections 2.17 and 2.18 (pages 26 ff.).

Related Solutions

[Tex/LaTex] Using another font for a glyph that is not available in the current font

Using two (or more) fonts in XeLaTeX (or LuaLaTeX) is very easy, since the fontspec package which handles fonts for those engines provides commands for loading new fonts and assigning macro names to them.

What you cannot do is have automatic switching from one font to another if XeTeX fails to find a glyph in a particular font.

Here's an example of font switching used for inserting phonetic characters (which many fonts don't have). The standard font that linguists use for phonetics is Doulos SIL. You can use it in XeLaTeX in the following way:

\documentclass{article}
\usepackage{fontspec}
% We are using Linux Libertine O as our main serifed font
\setmainfont{Linux Libertine O}
% now declare a command \doulos to load the Doulos SIL font
\newfontfamily\doulos{Doulos SIL}
% now create a \textIPA{} command
\DeclareTextFontCommand{\textIPA}{\doulos}
\begin{document}
Here is some text in the main font.
% We now have two ways to enter IPA characters directly in the document:
% Use the \doulos command inside a group
{\doulos [ðɪsɪzsəmfənɛtɪks]}
% or use the \textIPA command
\textIPA{[ðɪsɪzsəmfənɛtɪks]}
\end{document}

In your new example using checkmarks and crosses, you can do things similarly. Here I've used Arial Unicode MS and Zapf Dingbats to show two different versions of these characters (I don't have the Code2000 font). But the principle is exactly the same.

\documentclass{article}
\usepackage{fontspec}
\setmainfont{Arial Unicode MS}
\newfontfamily\dingbats{Zapf Dingbats}
\DeclareTextFontCommand{\textding}{\dingbats}
\begin{document}
\section{In the main font}
 (✓)  (✗)
\section{In the Dingbats font}
{ (\textding{✓})  (\textding{✗})}
\end{document}

output of code

[Tex/LaTex] Why are tfm files missing in the LaTeX rsfs package

With the help of the discussion of my question (see comments), I found a solution:

On my computer, the MikTex installation was quite new, and the font has never been used before, therefore the required tfm files had not been created.

However, the matplotlib Python package (dviread.py) tried to identify the tfm files before usage in order to create a fontfile cache (see question here https://stackoverflow.com/questions/50875637/matplotlib-how-do-i-have-to-provide-font-metrics-files-for-rendering-text-by-te)

I have installed a complete TeXLive installation on another computer. There the tfm files were already created during installation and I just copied the files to the corresponding location on my computer.

Best Answer

Related Solutions

[Tex/LaTex] Using another font for a glyph that is not available in the current font

[Tex/LaTex] Why are tfm files missing in the LaTeX rsfs package

Related Question