For a fairly large (> 100 pages) document that I am writing, I have run pdffonts
to check whether the fonts are suitably embedded. The output is as follows:
C:\>pdffonts main.pdf name type emb sub uni object ID ------------------------------------ ----------------- --- --- --- --------- PEUMGT+Utopia-Regular Type 1C yes yes no 10 0 QIAYNS+Utopia-Bold Type 1C yes yes no 8 0 XUFKIZ+Utopia-Italic Type 1C yes yes no 61 0 CVIUTI+Fourier-Math-Letters-Italic Type 1C yes yes no 270 0 YJVFRW+Fourier-Math-Symbols Type 1C yes yes yes 282 0 LPRTGE+Fourier-Math-Extension Type 1C yes yes yes 332 0 UYVFMY+Fourier-Math-Letters-Bold-Italic Type 1C yes yes no 592 0
I have been using the \include{fourier}
package so as to have the fourier fonts which I like a lot both for math and for regular use. I see from the font output table that I have some Utopia fonts as well which are from Adobe as mentioned in the fourier package documentation. I have three questions:
-
I would like to know what the "random" letters before the font name in the table means (e.g. on line one we have PEUMGT).
-
I would like to learn how to interpret the font output table better. In the last second to last column, we have in the final row the number
592
. What does this mean? -
Where can I find more information on the
pdffonts
command?
Best Answer
AFAICS, questions No. 1+2 have not yet been fully answered...
1.
'I would like to know what the "random" letters before the font name in the table means (e.g. on line one we have PEUMGT).'
2.
'I would like to learn how to interpret the font output table better. In the last second to last column, we have in the final row the number
592
. What does this mean?'592 0 obj
, ending with the lineendobj
. Everything in between defines this object. However, some other objects may be referenced: if you find strings saying691 0 R
you know to look for object 691, generation 0 now in the same way as you looked for object 592 initially.592
object is used, search for all occurences of592 0 R
...Update:
3.
What does the values in the
uni
column mean? (Actually, not asked by the OP, but added by myself because it fits the context... :)Values in the
uni
column indicate whether the font in question is accompanied by a/ToUnicode
table (a separate object in the PDF, if present). This table provides a reverse mapping from "character codes" to unicode characters or code points.Without a correct and valid
/ToUnicode
table, any text extraction will most likely fail forCustom
-encoded fonts and result in unreadable garbage:pdftotext
will not work as expected;You can test this by opening any PDF in a text editor. If you find the string
/ToUnicode
inside, change it to/toUnicode
. (That change in capitalization will make the case sensitive keyword no longer be found.) After that, your PDF will still display identically, but text extraction will no longer work (for these fonts which the disabled/ToUnicode
tables where serving).[You may now ask: how is the text then still displayed correctly inside the viewer? The reason is that the forward mapping of the (Unicode) characters or code points (to glyph shapes to be drawn for them) is using a different mechanism... see point 4.]
4.
Newer versions of
pdffonts
display an additional column,encoding
. It looks like this:What do the values in the
encoding
column mean? (Question also added by me :)The 'encoding' of a font represents actually the mentioned forward mapping (see point 3) from a "character code" to the glyph ID inside the font so that the PDF renderer knows how to draw a particular glyph representing the char code. (Note, that the technical term "char code" in this context is not the same as "letter" or "character". The "char code" for expressing the letter "a" in a PDF text object may be "z", or anything else.)
There are various mechanisms for font encodings:
/Differences
array to the final font encoding.So why do PDFs not always use the 'builtin' encodings? Because they do not always embed the full font program. Sometimes they embed only that subset of glyphs which actually occur in the PDF document.