[Tex/LaTex] Placing the un-ligatured text in the OCR layer

ligaturespdf

I was reading Why can't "fi" be separated when being copied from a compiled pdf? and had a thought:

I know that v1.4 and up PDF documents have an OCR layer. Would it be possible to have PDFTeX or luaTeX place the un-ligatured text into the OCR layer, so that you don't have inconstant or odd behaviour when copying ligatures?

It also occurs to me that this also be a workaround for my troubles copying mathmode greek characters, discussed in Proper way to use greek letters in an English document . This also seemed like an ideal way to provide more accessible code for equations, as trying to copy them right now is not exactly useful.

Best Answer

Two possible tools you can use to achieve what you are after:

The first one is calibre-ebook, which can do pdf to pdf conversions, plus a tonne of other formats.

The second is Heiko Oberdiek's experimental package accsupp. This enables you to map alternatives to characters and text for accessibility purposes, but will achieve what you want as well. At the end of the day I think it is high time we stop using ligatures. They don't offer much in terms of improving typography on a screen.

The below example is from the package. You need to adapt to suit.

\documentclass{article}
\usepackage[unicode]{hyperref}
\usepackage{accsupp}[2007/11/14]
\begin{document}
  \begin{equation}
    \BeginAccSupp{
      method=pdfstringdef,
      unicode,
      ActualText={%
        a\texttwosuperior +b\texttwosuperior
        =c\texttwosuperior
      }
    }
    a^2 + b^2 = c^2
    \EndAccSupp{}
  \end{equation}
\end{document}

Your suggestion to map it to the OCR layer is not an option unless you scan the PDF and OCR it. To summarize start with calibre-ebook and if it cannot offer you what you want explore the other options.

Related Solutions

[Tex/LaTex] Placing a PDF bookmark to the index page

\printindex starts a theindex environment, which by default calls \twocolumn, which - and this causes your problem - starts a new page. The following should do the trick:

\cleardoublepage
\pdfbookmark[0]{\indexname}{idx}
\printindex

(You could also simply include the index in your table of contents.)

[Tex/LaTex] Plain text for cut and pasting, ligatures for viewing (or, disabling ligatures)

I think the accepted answer to Is it possible to provide alternative text to use when copying text from the PDF? suggests that the accsup package will do it. You will either no longer be able to type in ligatures directly, or have to make your ligatures active characters.

Best Answer

Related Solutions

[Tex/LaTex] Placing a PDF bookmark to the index page

[Tex/LaTex] Plain text for cut and pasting, ligatures for viewing (or, disabling ligatures)

Related Question