A common approach is to let Ghostscript (gs
) optimize and compress the PDF after it has been created with pdflatex
.
Ghostscript is installed by most Linux distributions and easily available for other platforms (Windows as binaries, MacOS via MacPorts). In fact, almost all size-optimizing tools for PDF (save for Acrobat) you can find on the internet, internally use Ghostscript -- so you can as well call it directly.
There is a plethora of options available; I personally use the following:
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.5 -dNOPAUSE -dQUIET -dBATCH -dPrinted=false -sOutputFile=foo-compressed.pdf foo.pdf
I use this mostly for beamer presentations, where it gets me a size reduction of 60–70 percent. (A 10 MiB lecture note becomes 3–4 MiB in size.)
Edit 2020-02-06: Added -dPrinted=false
to preserve Hyperlinks.
Edit 2020-09-10: Changed -dCompatibilityLevel
from 1.4 to 1.5 as pdflatex
outputs PDF 1.5 by default since 2010.
Martin Heller has stated the correct answer: dvipdfm
uses a different font format than pdftex
. You can look into the PDF file by loading it into a text editor. Sometimes (well, often), the objects are compressed and you only see some data. So you either need a decompression algorithm built into your head, or use a tool like qpdf to uncompress the objects (that is what I do):
qpdf --qdf --object-streams=disable test-pdflatex.pdf test-pdflatex-long.pdf
Now the output file is much more readable and you can now compare the output of dvipdfm
and pdftex
. I don't know if this applies to all cases, but in this example you can take a look at the font object:
% dvipdfm:
9 0 obj
<<
/FontFile3 11 0 R
/Ascent 694
/CapHeight 683
/Descent -194
/Flags 6
/FontBBox [-40 -250 1009 750 ]
/FontName /DJLCQW+CMR10
/ItalicAngle 0
/StemV 69
/Type /FontDescriptor
>>
endobj
and
% pfdtex
9 0 obj
<<
/FontFile 11 0 R
/Ascent 694
/CapHeight 683
/CharSet (/A/C/D/E/I/L/M/N/P/Q/S/U/V/a/b/c/comma/d/e/f/g/h/hyphen/i/j/l/m/n/o/one/p/period/q/r/s/t/u/v/w/y)
/Descent -194
/Flags 4
/FontBBox [ -40 -250 1009 750 ]
/FontName /QJZLYL+CMR10
/ItalicAngle 0
/StemV 69
/Type /FontDescriptor
/XHeight 431
>>
endobj
Both have different entries referring to the font file (/FontFile3
and /FontFile
). According to the table 126 "Embedded font organization for various font types" in the PDF specification, the entry /FontFile
refers to a Type1 font program and /FontFile3
to whatever the subtype in the referred stream is. So we need to take a look at object #11 in the dvipdfm
file:
11 0 obj
<<
/Subtype /Type1C
/Length 12 0 R
>>
stream
....
endstream
endobj
So it is Type1C
, which is according to the same table in the PDF spec: "Type 1–equivalent font program represented in the Compact Font Format (CFF), as described in Adobe Technical Note #5176, The Compact Font Format Specification."
To find out what the secret of CFF is, a look at the introduction of "The Compact Font Format Specification" suffices:
Principal space savings are a result of using a compact binary representation for most of the information, sharing of common data between fonts, and defaulting frequently occurring data.
Best Answer
The TeXlive version has compressed object streams. That indicates that MikTeX and TeXlive have different settings for
\pdfobjcompresslevel
. compressed object streams is a lossless compression method, where not just a single object is compressed, but a range of objects. This leads to smaller sizes. Why one distribution would set it as a default and the other one not, is beyond my knowledge. And without MikTeX installation to verify, I can only assume that they have different defaults.When uncompressed, they still don't have the same size:
That's still about 1% difference.