[Tex/LaTex] Why does pdflatex produce bigger output files than latex+dvipdfm

compilingdvi-modedvipdfmxfile sizepdftex

Consider this sample code:

\documentclass{article}
\usepackage{lipsum}

\begin{document}
\lipsum[1-4]
\end{document}

When I compile this with latex and then use dvipdfm, the output file is 7893 bytes. When I use pdflatex, the output PDF is a whopping 20696 bytes. Naturally, the outputs are visually indistinguishable one from another.

Why does this happen? What does pdflatex put in there that takes so much space?

For reference, I have used the latest MikTeX 2.9 on Windows 7, and ran the commands without any extra switches.

Best Answer

Martin Heller has stated the correct answer: dvipdfm uses a different font format than pdftex. You can look into the PDF file by loading it into a text editor. Sometimes (well, often), the objects are compressed and you only see some data. So you either need a decompression algorithm built into your head, or use a tool like qpdf to uncompress the objects (that is what I do):

qpdf --qdf --object-streams=disable test-pdflatex.pdf test-pdflatex-long.pdf

Now the output file is much more readable and you can now compare the output of dvipdfm and pdftex. I don't know if this applies to all cases, but in this example you can take a look at the font object:

% dvipdfm:
9 0 obj
<<
  /FontFile3 11 0 R
  /Ascent 694
  /CapHeight 683
  /Descent -194
  /Flags 6
  /FontBBox [-40 -250 1009 750 ]
  /FontName /DJLCQW+CMR10
  /ItalicAngle 0
  /StemV 69
  /Type /FontDescriptor
>>
endobj

and

% pfdtex
9 0 obj
<<
  /FontFile 11 0 R
  /Ascent 694
  /CapHeight 683
  /CharSet (/A/C/D/E/I/L/M/N/P/Q/S/U/V/a/b/c/comma/d/e/f/g/h/hyphen/i/j/l/m/n/o/one/p/period/q/r/s/t/u/v/w/y)
  /Descent -194
  /Flags 4
  /FontBBox [ -40 -250 1009 750 ]
  /FontName /QJZLYL+CMR10
  /ItalicAngle 0
  /StemV 69
  /Type /FontDescriptor
  /XHeight 431
>>
endobj

Both have different entries referring to the font file (/FontFile3 and /FontFile). According to the table 126 "Embedded font organization for various font types" in the PDF specification, the entry /FontFile refers to a Type1 font program and /FontFile3 to whatever the subtype in the referred stream is. So we need to take a look at object #11 in the dvipdfm file:

11 0 obj
<<
  /Subtype /Type1C
  /Length 12 0 R
>>
stream
....
endstream
endobj

So it is Type1C, which is according to the same table in the PDF spec: "Type 1–equivalent font program represented in the Compact Font Format (CFF), as described in Adobe Technical Note #5176, The Compact Font Format Specification."

To find out what the secret of CFF is, a look at the introduction of "The Compact Font Format Specification" suffices:

Principal space savings are a result of using a compact binary representation for most of the information, sharing of common data between fonts, and defaulting frequently occurring data.