[Tex/LaTex] Problems with copy and paste from PDF using lstlisting (Again)

copy/pasteextended-characterspdf

So I had already posted a similar question concerning the minus sign (-), but now I have an issue with the apostrophe (') that the previous solutions do not seem to solve. This seems to be a problem unrelated to listings since I cannot even get \verb!'! to create a PDF that when copied I get ASCII code 39. All I manage to get is an extended character (e2 80 99 according to hexdump).

Here's a MWE:

\documentclass{article}
\begin{document}
\verb!'!
\end{document}

So the question is:

How can I create a PDF that shows a single quote that when copied has the ASCII value of a single quote?

Best Answer

In the absence of a ToUnicode CMap pdf viewers will do some heuristics to try to map the glyphs to unicode code points, but beyond ASCII this can be flakey and work only some of the time in some viewers. (See for example this patent). Hence Stephan Lehmke seeing different results to Yossi Farjoun. You can add a tounicode table easily enough, but the character in question is a quoteright which usually maps to U+2019. You could make the mapping for quoteright to U+0027 (the code point for the quotesingle glyph) which will solve your immediate problem but then single-right quotes elsewhere in the document (outside listings) would be affected. I found there is an undocumented "namespace" feature of \pdfglyphtounicode that allows to restrict the remapping to typewriter fonts only:

\documentclass{article}
\input glyphtounicode.tex
\input glyphtounicode-cmr.tex
\pdfglyphtounicode{tfm:cmtt10/quoteright}{0027}
\pdfgentounicode=1
\begin{document}
\verb!'! `hello'
\end{document} 

Actually, it turns out that cmtt10 does in fact contain a quotesingle glyph which you may prefer to use anyway (in which case the \pdfglyphtounicode line above is not needed). To access this glyph use \usepackage{textcomp} and then \texttt{\textquotesingle} should do it. For the listings environment I believe you can do \lstset{upquote=true} to make it use this glyph when it sees an ASCII apostrophe.

Related Question