[Tex/LaTex] Merging duplicate embedded fonts

embeddingfontspdfpdftex

I use many PDF images (converted from SVG) in my LaTeX document (which I create with pdflatex).

I noticed that there are a lot of duplicate embedded subset fonts in my final document. I know this is due to the used images.

After searching around I noticed that merging embedded fonts is a hassle. I do not really understand why, as a tool could combine these subsets, remove all entries with the same encoding and modify the text blocks accordingly (but lets not discuss this further, I'll believe the Internet 🙂 ). I gave up on the embedded subset fonts.

Now I regenerated all images making sure they do not have a subset of the font, but contain the full font. (I checked this with pdffonts.)

After building the document again, it still has the duplicate (non-subset) fonts:

$> pdffonts mydoc.pdf
name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
...
DejaVuSans                           TrueType          WinAnsi          yes no  yes   1066  0
DejaVuSans-Bold                      TrueType          WinAnsi          yes no  yes   1067  0
DejaVuSans                           TrueType          WinAnsi          yes no  yes   1117  0
DejaVuSans-Oblique                   TrueType          WinAnsi          yes no  yes   1118  0
DejaVuSans                           TrueType          WinAnsi          yes no  yes   1136  0
DejaVuSans-Oblique                   TrueType          WinAnsi          yes no  yes   1137  0
DejaVuSans-Oblique                   TrueType          WinAnsi          yes no  yes   1243  0
...

My question is: why are there duplicate fonts?! It must be very easy for pdflatex to get rid of these duplicate fonts…
Or do I need to provide some flag to pdflatex? Or add a package to my document?

Update

The requested document can be downloaded here and is build using:

\documentclass{article}
\usepackage{graphicx}

\begin{document}
\includegraphics{image1}
\includegraphics{image2}
\end{document}

If desired, the full project including both test images and the SVG originals can be found here.

The PDF images contain the texts 'Image1' and 'Image2' and both have their font fully embedded.
pdffonts shows the following information of the final document:

$ pdffonts mydoc.pdf 
name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
SDXKYB+CMR10                         Type 1            Builtin          yes yes no       6  0
DejaVuSans                           TrueType          WinAnsi          yes no  yes     11  0
DejaVuSans                           TrueType          WinAnsi          yes no  yes     17  0

DejaVuSans is embedded twice, once for each used image. My real document has loads of images, so the problem is more severe…

I do not know whther it is interesting, but pdflatex says:

This is pdfTeX, Version 3.1415926-2.5-1.40.14 (TeX Live 2013/Debian)

Best Answer

The program pdfTeX only merges fonts in the formats Type 1 (or Type 1C). image1.pdf and image2.pdf contains TrueType fonts. From the sources of pdfTeX, pdftoepdf.cc:

static void copyFont(char *tag, Object * fontRef)
{
    ...
    // Only handle included Type1 (and Type1C) fonts; anything else will be copied.
    // Type1C fonts are replaced by Type1 fonts, if REPLACE_TYPE1C is true.
    ...
}