[Tex/LaTex] Viewer-independent copyable spaces at the beginning of a line

copy/pastepdfspacingverbatim

I need to include a couple of Python code listings, where the indentation of the lines (using some number of spaces) is significant. I would like for the code listings to be copyable, so the spaces at the beginning of the line need to be be copied along with the text.

This question has been asked in various ways before (e.g. How to make listings code correct copyable from PDF and with hyperlink, or How can I make source code included with minted copyable?). Those questions focus on making line numbers uncopyable, though.

Making the spaces at the beginning of a line copyable seems to be harder: "I am not sure it is possible to specify in the PDF (at least in a viewer-independent way) that the indentation should be copied too" (CyberSingularity). At How to make listings code indentation remain unchanged when copied from PDF?, Philippe Goutet suggests a solution (turning the spaces into visible spaces, and coloring them in the background color so that they appear invisible) that works using Acrobat Reader, but not all readers. He says "It works under Acrobat Reader and it's extremely pleasant to be able to quickly copy/paste code without problem (perhaps the problem can be circumvented by writing direct PDF code to tell that it's a space, I've never had the time to try)".

Is it possible to produce a PDF with a code listing with copyable real spaces at the beginning of a line?

Minimal example: The line return x should start with four spaces.

\documentclass{article}

\begin{document}
\begin{verbatim}
def myfunction(x):
    return x
\end{verbatim}
\end{document}

I know that I could attach the code to the PDF as a file, but that's not what I want.

Best Answer

(it seems this works everywhere apart from acrobat reader)

This is based on the example by @DavidCarlisle.

The cmtt visible space character seems to be labelled differently in different cmtt variants. For cm-super (which is loaded here when I use \usepackage[T1]{fontenc}), the respective character is named uni2423 which seems to cause problems with evince when copying that character.

So I rigorously defined everything which looks like space to a non-break space.

You might want to restrict this to verbatim ;-)

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage{color}
\input{glyphtounicode}
\pdfglyphtounicode{visiblespace}{A0}
\pdfglyphtounicode{blank}{A0}
\pdfglyphtounicode{visualspace}{A0}
\pdfglyphtounicode{uni2423}{A0}
\pdfgentounicode=1
\begin{document}\showoutput
\makeatletter
\def\@xobeysp{\textcolor{white}{\char32}}
\makeatother
\begin{verbatim}
def myfunction(x):
    return x
\end{verbatim}
\end{document}

I am inclined to consider the fact that apparently no (consecutive or beginning-of-line) spaces can be copied from Acrobat a bug.

Or is this specified anywhere?

At least it's completely the same with official Adobe documents like the PDF Reference.

So I consider this answer valid no matter what :-)