[Tex/LaTex] How to create identical PDF files with xelatex

pdfxetex

The question how to create identical PDF files with pdflatex was basically answered in this question already:

How to create identical PDF files?

The catch was to filter out the entry /ID in the trailer dictionary after the PDF file was created.

I am now forced to use xelatex and there this trick does not work any more, because there are many more changes in the resulting PDF files.

Here is the minimal working example again:

\documentclass[11pt,a4paper]{article}
\usepackage[]{hyperref}
\hypersetup{
        pdfauthor={None},
        pdfcreationdate={D:20131010120000},
        pdfmoddate={D:20131010120000}
}
\begin{document}
foo 
\end{document}

If I now create 2 PDF files, convert them to a hex dump and view the differences…

xelatex mwe.tex; mv mwe.pdf a.pdf; xxd a.pdf > a.bin
xelatex mwe.tex; mv mwe.pdf b.pdf; xxd b.pdf > b.bin
diff -u a.bin b.bin 

… then you can see that there are many differences in the resulting PDF files unfortunately.

Is there any way to produce bitwise identical PDF files with xelatex?

The reason I need this is that I'd like to create PDF files for a software release package. Of course the software release package should be identical when the underlaying software and documentation has not been changed.

Best Answer

with recent Texlive releases you can use

SOURCE_DATE_EPOCH=0 FORCE_SOURCE_DATE=1 xelatex pp407

You may also want to set

SOURCE_DATE_EPOCH_TEX_PRIMITIVES=1

the first setting is the epoch setting. That is, the number of seconds since 1970.

E.g., for today (rather than 0 which is the 1st Jan 1970) you could use 1505482364 as found by:

$ date +%s
1505482364

The second if set to 1 (or anything) causes tex commands such as \year to use the specified epoch date rather than the system clock.

The combination of two of them produces reproducible results (after the second run) in texlive2017 using the supplied test file.

First run:

SOURCE_DATE_EPOCH=0 FORCE_SOURCE_DATE=1 xelatex pp407

produces:

Package hyperref Warning: Rerun to get /PageLabels entry.

second run

SOURCE_DATE_EPOCH=0 FORCE_SOURCE_DATE=1 xelatex pp407

produces:

...
Output written on pp407.pdf (1 page).
Transcript written on pp407.log.

save pdf

cp pp407.pdf pp407-a.pdf

run again

Output written on pp407.pdf (1 page).
Transcript written on pp407.log.

compare

cmp pp407.pdf pp407-a.pdf

No output from cmp confirming that the files are identical.

Related Question