[Tex/LaTex] Why do PDF files produced by pdflatex not pass validation

hyperxmpluatexpdfpdf-apdftex

Why do simple PDF files compiled with pdflatex contain so many PDF errors?

The simplest example I can see is "Hello World!" that once compiled,

\documentclass{report}
\begin{document}
    Hello World!
\end{document}

produces a file "hello.pdf" that is fed to PDFbox:

$ java -jar ../preflight-app-1.8.10.jar hello.pdf

The file hello.pdf is not valid, error(s) :
1.2.1 : Body Syntax error, Single space expected
1.2.1 : Body Syntax error, Single space expected
1.2.1 : Body Syntax error, EOL expected before the 'endobj' keyword
1.2.1 : Body Syntax error, Single space expected
1.2.1 : Body Syntax error, Single space expected
1.2.1 : Body Syntax error, Single space expected
7.1 : Error on MetaData, Missing Metadata Key in catalog

LuaLaTeX shows the same problem, but in general with fewer errors.

Best Answer

The preflight tool of PDFBox validates against the PDF/A-1b standard, more than just PDF.

The first errors regarding PDF/A-1b should be resolved by a recent pdfTeX, I do not get these errors with pdfTeX 1.40.16 (TeX Live).

The last error 7.1 about MetaData comes from the requirement of PDF/A-1b to also embed the meta data in XMP format, see package hyperxmp, the latest version of package pdfx with support for PDF/A, or package xmpincl.

Related Question