[Tex/LaTex] Embedding a PDF file with clickable external links into a LaTeX document

embeddinglinkspdf

I want to embed a screenshot of a web page into a LaTeX document as a vector graphic with
clickable links.

I'm using wkhtmltopdf to render the web page as a letter-sized pdf file with working links. Then, I'm using this command to crop the pdf file while retaining the links:

gs -sDEVICE=pdfwrite -o "screenshot_cropped.pdf" -c "[/CropBox [`gs -sDEVICE=bbox -dBATCH -dNOPAUSE "screenshot.pdf" 2>&1 | awk '/HiRes/ { print $2,$3,$4,$5 }'`] /PAGES pdfmark" -f "screenshot.pdf"

(Commands taken from this thread on comp.text.tex; I can't use pdfcrop, since it apparently removes the links)

Finally, I'm embedding the file into my LaTeX document with the following:

\includegraphics[width=6 in]{screenshot_cropped.pdf}

Unfortunately, the links no longer work in the resulting pdf file. How can I fix this? I'm using XeLaTeX if it matters.

Best Answer

Quoting the pdfpages manual (page 2):

[...] all kinds of links1 will get lost during inclusion. (Using \includepdf, \includegraphics, or other low-level commands.)

However, there's a gleam of hope. Some links may be extracted and later reinserted by a package called pax which can be downloaded from CTAN [3]. Have a look at it!

The pax package mentioned is actually a Java program which reads the PDF file, extracts the links and writes their positions to a .pax file. This information can be processed by an extended version of \includegraphics which reinserts the links at the correct position. However, the package is considered experimental and only works with pdfLaTeX.

This is how to use pax:

  • Download and install the package from CTAN.
  • Run pdfannotextractor.pl --install. This downloads and installs PDFBox, a Java library necessary for using pax.
  • Now you can run pdfannotextractor.pl <filename.pdf> in order to read the links from any given PDF file and write them to filename.pax.
  • In your LaTeX document, invoke \usepackage{pax}. This extends \includegraphics in order to read and process the auxiliary .pax file.

Regarding the cropping of the PDF file with Ghostscript: In my experiments, this seemed to confuse pax, resulting in wrong positions of the hyperlinks. So it's probably best to call wkhtmltopdf with

wkhtmltopdf -B 0mm -L 0mm -R 0mm -T 0mm  

in order to not create any margin from the start.


This is my working test case (using the site https://tex.stackexchange.com/about):

Creation of the PDF and the annotation file:

wkhtmltopdf -B 0mm -L 0mm -R 0mm -T 0mm https://tex.stackexchange.com/about screenshot.pdf
pdfannotextractor.pl screenshot.pdf

LaTeX source code

\documentclass{article}
\usepackage[margin=0mm]{geometry}
\usepackage{pax}

% Visible squares for demonstration purposes, can be removed without harm
% (change \iftrue to \iffalse or remove the following lines altogether)
\iftrue
  \usepackage{hyperref}
  \hypersetup{
    filebordercolor={1 1 0},
  }
\fi

\begin{document}
    \includegraphics[scale=0.9]{screenshot}
\end{document}

Result:

output result, with visible squares denoting hyperlinks

(The cyan boxes are there to prove that hyperlinks work, and can be removed without consequences.)

Related Question