[Tex/LaTex] Embedding a PDF file with clickable external links into a LaTeX document

embeddinglinkspdf

I want to embed a screenshot of a web page into a LaTeX document as a vector graphic with
clickable links.

I'm using wkhtmltopdf to render the web page as a letter-sized pdf file with working links. Then, I'm using this command to crop the pdf file while retaining the links:

gs -sDEVICE=pdfwrite -o "screenshot_cropped.pdf" -c "[/CropBox [`gs -sDEVICE=bbox -dBATCH -dNOPAUSE "screenshot.pdf" 2>&1 | awk '/HiRes/ { print $2,$3,$4,$5 }'`] /PAGES pdfmark" -f "screenshot.pdf"

(Commands taken from this thread on comp.text.tex; I can't use pdfcrop, since it apparently removes the links)

Finally, I'm embedding the file into my LaTeX document with the following:

\includegraphics[width=6 in]{screenshot_cropped.pdf}

Unfortunately, the links no longer work in the resulting pdf file. How can I fix this? I'm using XeLaTeX if it matters.

Best Answer

Quoting the pdfpages manual (page 2):

[...] all kinds of links¹ will get lost during inclusion. (Using \includepdf, \includegraphics, or other low-level commands.)

However, there's a gleam of hope. Some links may be extracted and later reinserted by a package called pax which can be downloaded from CTAN [3]. Have a look at it!

The pax package mentioned is actually a Java program which reads the PDF file, extracts the links and writes their positions to a .pax file. This information can be processed by an extended version of \includegraphics which reinserts the links at the correct position. However, the package is considered experimental and only works with pdfLaTeX.

This is how to use pax:

Download and install the package from CTAN.
Run pdfannotextractor.pl --install. This downloads and installs PDFBox, a Java library necessary for using pax.
Now you can run pdfannotextractor.pl <filename.pdf> in order to read the links from any given PDF file and write them to filename.pax.
In your LaTeX document, invoke \usepackage{pax}. This extends \includegraphics in order to read and process the auxiliary .pax file.

Regarding the cropping of the PDF file with Ghostscript: In my experiments, this seemed to confuse pax, resulting in wrong positions of the hyperlinks. So it's probably best to call wkhtmltopdf with

wkhtmltopdf -B 0mm -L 0mm -R 0mm -T 0mm

in order to not create any margin from the start.

This is my working test case (using the site https://tex.stackexchange.com/about):

Creation of the PDF and the annotation file:

wkhtmltopdf -B 0mm -L 0mm -R 0mm -T 0mm https://tex.stackexchange.com/about screenshot.pdf
pdfannotextractor.pl screenshot.pdf

LaTeX source code

\documentclass{article}
\usepackage[margin=0mm]{geometry}
\usepackage{pax}

% Visible squares for demonstration purposes, can be removed without harm
% (change \iftrue to \iffalse or remove the following lines altogether)
\iftrue
  \usepackage{hyperref}
  \hypersetup{
    filebordercolor={1 1 0},
  }
\fi

\begin{document}
    \includegraphics[scale=0.9]{screenshot}
\end{document}

Result:

output result, with visible squares denoting hyperlinks

_{(The cyan boxes are there to prove that hyperlinks work, and can be removed without consequences.)}

Related Solutions

[Tex/LaTex] Embedding files into a PDF document with dvipdfmx

OK, I figured it out myself: The package navigator works fine with pdflatex and dvipdfmx and supports embedding files through the \embeddedfile macro:

\documentclass{article}

\usepackage{navigator}
\embeddedfile{sourcecode}{\jobname.tex}

\begin{document}
The document
\end{document}

The syntax of the command is

\embeddedfile[<optional description>]{<object name>}[<optional filename displayed in the viewer>]{<file>}

[Tex/LaTex] Embedding LaTeX for PDF generation

This is fairly straight-forward:

get a minimal (La)TeX distribution such as w32tex (if the license will allow it) or kergis (which should be okay anywhere since it's MIT licensed) http://www.kergis.com/en/kertex.html
TeX a sample document which includes every element you plan to support
use a utility like http://ctan.org/pkg/snapshot to get a list of the files needed
put all of the files in a directory w/in your application and set the TeX binary to look file files there first