[Tex/LaTex] How to configure tcblisting with minted to allow copy/paste from listing with whitespace characters

copy/pastemintedpythontcblistingtcolorbox

This answer from a similar question gets very close to a final copy/paste solution, but does not work for me as it does not select whitespace. I am asking a new question as the linked one is several years old, and my question focuses specifically on the copy/paste of whitespace in listings, preferably in a cross-platform solution. That is the resulting pdf will allow copy/paste from the document into an editor on windows/mac/linux, using acrobat/evince/ocular/etc. If cross platform is this sense is not realistic, I would at least like this capability on linux with evince if it is possible.

I am working on a Fedora 29 machine and need to be able to cut and paste python snippets from the pdf into an editor, and be able to run the snippet without manual edits. This is critical with python snippets as code blocks are specified by changes in whitespace indentation.

Here is my minimal working example (MWE).

\documentclass{article}
\usepackage{tcolorbox}
\usepackage{hyperref}
\tcbuselibrary{theorems,listings,skins,breakable,minted}

 % Patch accsupp to avoid copying line numbers when copying from listing
\usepackage{accsupp}
\newcommand\emptyaccsupp[1]{\BeginAccSupp{ActualText={}}#1\EndAccSupp{}}
\let\theHFancyVerbLine\theFancyVerbLine
\def\theFancyVerbLine{\rmfamily\tiny\emptyaccsupp{\arabic{FancyVerbLine}}}

\newcounter{listings}

\begin{document}

Listing~\ref{lst:example} shows an example code snippet.

\begin{tcblisting}{%
     theorem={Listing}{listings}{Example Listing}{lst:example},
     fonttitle=\scriptsize\bfseries,
     listing engine=minted,
     minted language=python,
     minted options={%
         linenos,
         breaklines,
         autogobble,
         fontsize=\small,
         numbersep=2mm,
         baselinestretch=1},
     overlay={%
       \begin{tcbclipinterior}
           \fill[gray!25] (frame.south west) rectangle ([xshift=4mm]frame.north west);
       \end{tcbclipinterior}},
     breakable, enhanced, listing only}
import os
import sys

SCRIPT_PATH = os.path.abspath(os.path.dirname(__file__))


class CopyPasteTest:
    """
    Testing that copy/paste works from pdf listings.
    """
    def __init__(self, numbers):
        for number in numbers:
            print number


if __name__ == "__main__":
    CopyPasteTest([1, 2, 3, 4])
\end{tcblisting}

\end{document}

Which results in the following pdf.

enter image description here

But when I select text in evince, you can see it does not highlight white space.

enter image description here

And when pasted into gedit, has lost its indentation and empty lines, which breaks the example.

enter image description here

Any help getting python code snippets to copy/paste out of a pdf into another editor would be greatly appreciated!

Best Answer

In my opinion it won't work reliably. The pdf viewers don't respect space chars and don't handle them in the same way. E.g. if you compile this document with lualatex (pdflatex gives similar results):

\documentclass{article}
\usepackage{tagpdf}
\tagpdfsetup{activate-mc,uncompress,interwordspace=true}
\begin{document}
x\ \pdffakespace\ \pdffakespace\ \pdffakespace\ \pdffakespace    y
\end{document}

Then you get in the pdf four real space glyphs (0067 is the space):

[<007400670067006700670067>525<0076>]TJ

But copy & paste (adobe + sumatra) gives only one space: x y.

I also tried with /ActualText (you could use accsupp for something similar):

\documentclass{article}
\usepackage{tagpdf}
\tagpdfsetup{activate-all,uncompress}
\begin{document}
\tagstructbegin{tag=Document}
\tagmcbegin{tag=P,raw={/ActualText <FEFF007800200020002000200079>}}
\verb+x     y+
\tagmcend
\tagstructend
\end{document}

This copies correctly with adobe but not with sumatra.

Similar differences/problems are mentioned here How to change character code in Type1 font?.

Better embed/attach the code instead of relying on copy & paste.

Related Question