[Tex/LaTex] Is pdftex 1.40.13 outputting malformed pdfs

errorspdfpdftex

I have several preexisting projects that produced valid pdf output on versions of pdftex up to 1.40.10. Now with version 1.40.13 I'm getting pdfs that the iText library complains are malformed. My question is whether I'm analyzing the situation correctly and whether it looks like the bug is in pdftex or iText.

The following demonstrates the difference in behavior on two systems, one with the old version and one with the new. The pdftk utility parses the pdf using the iText library:

|||| LM  me $ pdflatex --version | head -1
pdfTeX 3.1415926-1.40.10-2.2 (TeX Live 2009/Debian)
|||| LM  me $ pdflatex -shell-escape -interaction=nonstopmode me >/dev/null
|||| LM  me $ pdftk me.pdf cat output /dev/null
|||| LM  me $ 

---- rintintin me $ pdflatex --version | head -1
pdfTeX 3.1415926-2.4-1.40.13 (TeX Live 2012/Debian)
---- rintintin me $ pdflatex -shell-escape -interaction=nonstopmode me >/dev/null
---- rintintin me $ pdftk me.pdf cat output /dev/null
Error: Failed to open PDF file: 
   me.pdf
Errors encountered.  No output created.
Done.  Input errors, so no output created.

This is with identical (freshly synchronized) source files.

This doesn't happen with all my projects, and, e.g., I can't produce it with a simple "hello, world" document. I've posted the file me.pdf that generates the error here: http://www.lightandmatter.com/pdftk_bug/

Pdftk doesn't report the type of error, but it's possible to find out what it is by invoking an iText tool from the command line:

---- rintintin me $ java -cp /usr/share/java/itext1-1.4.jar com.lowagie.tools.plugins.InspectPDF me.pdf
trailer not found.

The old version of pdftex outputs pdf 1.4, the new 1.5. The iText library is supposed to be able to handle pdf versions up to 1.6.

Examining the ends of the two pdf files in a text editor, there does seem to be a clear difference in the trailers:

Pdf output by old version:

trailer
<< /Size 3577
/Root 3575 0 R
/Info 3576 0 R
/ID [<F3E9A772E7220505119484B4C1B5059E> <F3E9A772E7220505119484B4C1B5059E>] >>
startxref
38327286
%%EOF

Pdf output by the new version:

3499 0 obj <<
/Type /XRef
/Index [0 3500]
/Size 3500
/W [1 4 1]
/Root 3497 0 R
/Info 3498 0 R
/ID [<B51C8DB4F9B12A880DBBE0625164918E> <B51C8DB4F9B12A880DBBE0625164918E>]
/Length 9886
/Filter /FlateDecode
>>
stream
x\3325\234w|\224U\376\2663!\275@z#      !!=\201\220\204^D\222^PR\200\320^RJ^R\322CHDAW]u-\253^B
...lots of binary data snipped for brevity...
w*\374;\233\344\231A\367H\230^?6\3120\233\356t\230\303^C\246\317c^E\362\314\213;^S\346\343\307d\230G7\223\353fr\335$
endstream
endobj
startxref
37919910
%%EOF

The second to last line is supposed to be a byte offset to the cross-reference section.

|||| LM  me $ od -a -j 38327286 me.pdf | head -1
222151766   x   r   e   f  nl   0  sp   3   5   7   7  nl   0   0   0   0

---- rintintin me $ od -a -j 37919910 me.pdf | head -1
220516246   3   4   9   9  sp   0  sp   o   b   j  sp   <   <  nl   /   T

In the output from the old version, the byte offset points to the string "xref," which seems right. In the output from the new version, it points to something that looks different, perhaps a new style of xref used in pdf 1.5? When I compile a simple "hello, world" document with the new pdftex, it has a format that looks similar to this, but iText doesn't choke on it.

I haven't dug into the pdf specs or the source code of iText, so I don't know whether this is actually a malformed pdf or if it's valid pdf 1.5 that iText just isn't accepting. Ghostscript and poppler both accept the file without complaint.

Best Answer

The pdf you put online is a perfectly valid PDF 1.5 - so says Adobe Acrobar X Preflight. Acrobat X Preflight

Starting with PDF 1.5, PDFs can contain compressed object streams. Please update you version of pdftk or file a bug with them.