[Tex/LaTex] hyperref links break with pdftex + babel + Hebrew (or right-to-left language)

babelcross-referencinghebrewhyperrefright-to-left

Using the babel package to write Hebrew text exposes incompatibilities with all sorts of other packages. This questions is about the incompatibility with hyperref.

Basically, you can't get links with right-to-left text. It's about the direction rather than the non-Latin language – somehow the link-start command is placed at the end due to some sort of reversal. The problem is described in Guy Rutenberg's blog, here.

Here's an MWEs for \cite and \ref:

\documentclass{article}
\usepackage[hebrew,english]{babel}
\usepackage{hyperref}
\begin{document}
\section{A section}
\label{mysection}
LTR English cite \cite{MYSRC}. And now in RTL Hebrew:

\selectlanguage{hebrew}
\cite{MYSRC}
\selectlanguage{english}

Let's refer to the current section:

\selectlanguage{hebrew}
\ref{mysection}

\begin{thebibliography}{MYSRC}
\bibitem[MYSRC01]{MYSRC}
The bibliography entry for MYSRC.
\end{thebibliography}
\end{document}

For both of these (and for \autoref), you get:

! pdfTeX error (ext4): pdf_link_stack empty, \pdfendlink used without \pdfstart
link?.
\AtBegShi@Output ...ipout \box \AtBeginShipoutBox 
                                                  \fi \fi 

Notes:

  • Vafa Khaligi's comment below may be useful in isolating the minimum offending code out of everything 'babel' does, although I can't say for sure.
  • The blog entry I linked to has a workaround – which only works with xetex. Can it be adapted somehow?
  • Stefan Kottwitz suggested a workaround which won a bounty on this question. But what I would really like is to make hyperref get such links correctly somehow.

Best Answer

This is not an answer. I was able to produce the minimal working example as:

\documentclass{article}
\usepackage{hyperref}
\TeXXeTstate=1
\def\neweverypar{{\setbox0\lastbox\beginR\usebox0}}
\let\origeverypar=\everypar
\def\everypar#1{\origeverypar{\neweverypar#1}}
\begin{document}
This is \href{http://google.com}{Google} and ...
\end{document}

after runing pdflatex on this, you get exactly the same error message. I have also forwared this to Heiko Oberdiek and if he answers me, then I post his answer here but I think this actually seems to be a limitation of the pdfTeX engine with TeX--XeT algorithm (the algorithm already have many annoying bugs) and it may turn out that this is not even fixable. XeTeX and PDFTeX both uses TeX--XeT but hyperref with PDFTeX uses PDFTeX primitives for hypertext and in XeTeX, it uses \special.

Edit: This is verbatim response of Heiko Oberdiek:

I don't know. AFAIK there isn't even an easy way to test, whether \beginR or \beginL is active. A workaround could be to put two labels with some distance to find out the writing direction, thus that the order of \pdfstartlink and \pdfendlink could be switched if necessary. But that kind of workaround does not scale, the hash table size in TeX is limited, large document with many links will too easily hit the limit.

In LuaTeX the node lists can be examined and the switches can be done at Lua level.

I also asked him, why this problem happens with PDFTeX and not with XeTeX. Here is his response:

Perhaps the \specials are resorted automatically in xdvipdfmx.

Related Question