PDF Optimization – How to Create Small PDF Files Optimized for the Internet

file sizefontsonlinepdf

I feel that PDF documents when intended to be uploaded on the Internet should be of minimal final size in bytes. Many strategies exist to create such small documents:

Pay attention to the size of the embedded raster images.
Create diagrams with tikz or pstricks and avoid external applications that can create unwanted big files. Using tikz or pstricks provides access to an almost unlimited control of the properties of a specific object. For instance, creating a point A of coordinates (1,1) should be preferred to a point A of coordinates (1.0000000,1.00000000), that you cannot control in many applications.
Pay attention to the fonts used in your documents. Some fonts contain a coding for a single size and are then scaled, when used in titles or equivalents. Some other fonts contain a coding for different size, making the document a bit larger. You may observe a non negligible difference.

I am interested in this 3rd point: have some of you investigated the best fonts for smaller PDF files?

Edit Compress.SmallPDF sounds like a good online solution to efficiently compress pdf files for free.

Best Answer

There are a number of tricks for getting optimized pdfs. Many of them are implemented in the tool pdfsizeopt. With some patches (posted in the pdfsizeopt bugtracker) this tool can run on all my tex-generated pdfs (and nearly all of the non-tex-generated ones). I use the commandline:

python ./pdfsizeopt.py --use-pngout=true --use-jbig2=true --use-multivalent=true --do-unify-fonts=false filetocompress.pdf

I use --do-unify-fonts=false even though it produces slightly larger pdfs, because of a bug where a few glyphs are not displayed with certain pdf viewers (windows adobe reader, for example).

There are indeed various things you can do during document production with tex, to make sure that the compressed pdf ends up as small as possible: several of these are discussed in the EuroTeX 2009 White paper about pdfsizeopt (available at https://github.com/pts/pdfsizeopt/releases/download/docs-v1/pts_pdfsizeopt2009.psom.pdf).

As regards fonts, pdfsizeopt will recode fonts to the very compressed CFF format, and take care of subsetting and duplication issues. I haven't investigated deeply, but in my tests it seems that of the 2 options for type 1 encoded T1 (multilingual) tex fonts, the Latin Modern fonts generally produce significantly larger PDFs than the CM-Super version (which is unfortunate, because Latin Modern is superior in just about every other way (see this question). I just did a quick experiment and this difference in size seems to be only for the pre-pdfsizeopt pdfs: after pdfsizeopt, Latin Modern is the same or smaller than CM-Super.

Using fonts that don't have optical scaling will indeed produce a smaller PDF, but I don't recommend it because if you are using multiple sizes then the non-optically scaled fonts will look much worse.

BatchA.bat

rem batchA.bat
rem echo off
latex -interaction=nonstopmode %1
dvips -R -t unknown %1
ps2pdf -dAutoRotatePages#/None -dCompatibilityLevel#1.5 -dPDFSETTINGS#/prepress %1.ps %1-temp.pdf
pdfcrop --restricted --hires %1-temp %1.pdf
pdftops -level3 -eps %1.pdf
rem acrord32 %1.pdf
del %1.log
del %1.aux
del %1.dvi
del %1.ps
del %1-temp.pdf

It can be used for an input file with any paper size.
Steps: TEX -> DVI -> PDF -> cropping -> EPS.
pdfcrop takes a significant amount of time to do cropping. If you don't need cropping, don't use this batch file.

BatchB.bat

rem batchB.bat
echo off
latex -interaction=nonstopmode %1
dvips -R -t unknown %1
ps2pdf -dAutoRotatePages#/None -dCompatibilityLevel#1.5 -dPDFSETTINGS#/prepress %1.ps
pdftops -level3 -eps %1.pdf
rem acrord32 %1.pdf
del %1.log
del %1.aux
del %1.dvi
del %1.ps

It can be used only for an input file with tight paper size.
Steps: TEX -> DVI -> PDF -> EPS.
It runs faster than batchA.bat because no cropping with pdfcrop. The drawback is you must specify the paper size tightly.

BatchC.bat

rem batchC.bat
echo off
latex %1
dvips -t unknown %1
gswin32c -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=%1.pdf %1.ps
pdftops -eps %1.pdf
rem acrord32 %1.pdf
del %1.log
del %1.aux
del %1.dvi
del %1.ps

It is almost the same as the batchB.bat but with fewer switches to speed up the compilation.
It can be used only for an input file with tight paper size.
Steps: TEX -> DVI -> PDF -> EPS.
It runs faster than batchA.bat because no cropping with pdfcrop. The drawback is you must specify the paper size tightly.

BatchD.bat

rem batchD.bat
echo off
tex %1
dvips -t unknown %1
gswin32c -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=%1.pdf %1.ps
pdftops -eps %1.pdf
rem acrord32 %1.pdf
del %1.log
del %1.aux
del %1.dvi
del %1.ps

It is almost the same as the batchC.bat but using tex.exe instead of latex.exe. The input file must be in plain TeX format. I hope it becomes faster than methodC.bat, but it has not been tested yet as I have many troubles to convert the input file from LaTeX to plain TeX. Benchmarking will be done soon.
It can be used only for an input file with tight paper size.
Steps: TEX -> DVI -> PDF -> EPS.
It runs faster than batchA.bat because no cropping with pdfcrop. The drawback is you must specify the paper size tightly.

BatchE.bat

rem batchE.bat
rem echo off
latex -interaction=nonstopmode %1
dvips -R -t unknown -E %1 -o %1-temp.eps
epstool --copy --bbox %1-temp.eps %1.eps
epstopdf --hires %1.eps
rem acrord32 %1.pdf
del %1.log
del %1.aux
del %1.dvi
del %1-temp.eps

It can be used for an input file with any paper size.
Steps: TEX -> DVI -> EPS -> bounding box correction -> PDF.
The file size of the resulting EPS is larger than that of one produced by each of the first 4 batch files (a,b,c,d).

BatchF.bat

rem batchF.bat
echo off
latex -interaction=nonstopmode %1
dvips -R -t unknown %1 -o %1-temp.ps
ps2eps %1-temp.ps
epstool --copy --bbox %1-temp.eps %1.eps
epstopdf --hires %1.eps
rem acrord32 %1.pdf
del %1.log
del %1.aux
del %1.dvi
del %1-temp.ps
del %1-temp.eps

It can be used for an input file with any paper size.
Steps: TEX -> DVI -> PS -> EPS -> bounding box correction -> PDF.
The file size of the resulting EPS is larger than that of one produced by each of the first four batch files (a,b,c,d).
I forgot the benchmark result when comparing batchE.bat and batchF.bat.

Automate.bat

rem automate.bat
rem it takes a single character from {a,b,c,d,e,f}.
rem the options are case-insensitive.
rem for example: automate a
rem another example: automate F
echo off
for %%x in (*.tex) do batch%1.bat %%~nx
pause

If you have any problem, drop a comment.

Update in response to your misuse.

Each of {batchA.bat,batchB.bat,batchC.bat,batchD.bat,batchE.bat,batchF.bat} takes one input file name without extension. For example, if you want to use batchE.bat for helloworld.tex, then you must type batchE helloworld and hit enter.

If you want to use batchE.bat for a bunch of input files that exists in the same directory in which all batch files exist, then you type automate E and hit enter.

TeX – How to Create Linearized PDF

The fast web view is simply a method of allowing content to be displayed as it is being downloaded.

It will not, however, be fast. As such the name fast can be deviating from its meaning.

You should use qpdf --linearize as noted by @MartinSchroeder (pdfopt is deprecated as noted in the comments).

PDFcrop will also do that for you, however with additional work done, i.e. cropping your PDF.