[Tex/LaTex] A basic question about raw pdf file output by pdftex

pdfpdftex

Consider the following simple tex file, to be compiled with pdflatex.

\pdfcompresslevel=0
\documentclass{standalone}
\begin{document}
\hrulefill
\end{document}

Thanks to \pdfcompresslevel=0, the resulting pdf file is human readable. One can find the following lines:

4 0 obj <<
/Type /ObjStm
/N 4
/First 22
/Length 257       
>>
stream
2 0 1 105 5 139 6 191
% 2 0 obj
<<
/Type /Page
/Contents 3 0 R
/Resources 1 0 R
/MediaBox [0 0 343.711 0.398]
/Parent 5 0 R
>>
% 1 0 obj
<<
/ProcSet [ /PDF ]
>>
% 5 0 obj
<<
/Type /Pages
/Count 1
/Kids [2 0 R]
>>
% 6 0 obj
<<
/Type /Catalog
/Pages 5 0 R
>>
endstream
endobj

As far as I understand, a call like 5 0 R refers to the object defined by 5 0 obj. Such an object seems indeed to be defined but with a leading % that is supposed to be the comment char in pdf format.

So two questions.

  1. Is my analysis right?
  2. Why does pdftex output such comments (I mean why does it output % 2 0 obj)?

Best Answer

What you see is not the completely un-compressed PDF. To get such, you will have to say

\pdfcompresslevel=0
\pdfobjcompresslevel=0

There are two different levels of compression according to the PDF specification.

The first one, controlled by the value of \pdfcompresslevel in the pdftex driver, deals with the content streams of page objects, XObjects, embedded files and others, that is, data enclosed between stream and endstream keywords.

Secondly, PDF objects of a PDF file can themselves be packed into content streams of container objects of type /ObjStm. This is controlled by \pdfobjcompresslevel.

Since, in the example code, \pdfcompresslevel is set to zero, the content stream of the /ObjStm object No. 4, as listed in the second code box, is not compressed. One can see the objects that are packed into it. The first line, immediately after the stream keyword, lists the number and byte offset within the stream for all packed objects. The commented object numbers are only there for information.