[Tex/LaTex] Ensure verbatim code block is copy/paste-able

listingspdftexverbatim

After rendering a document containing this code block

\begin{verbatim}
if [ ! -d .git ]; then git init; fi         # Initialises a new Git repository, if doesn't already exist.
if [ ! -f README.md ]; touch README.md; fi  # Creates an empty README.md file,  if doesn't already exist.
git add -A                                  # Stages any files/directories present, in preparation to commit them to local Git repo.
git commit -m 'first commit'                # Commits the staged files/dirs to the local Git repo.
git remote add origin GIT_REMOTE_URL        # Adds the GitHub repo created above as a "Git remote" with the alias "origin".
\end{verbatim}

to PDF using pdflatex, and viewing the PDF in Apple's Preview application, the rendered code block looked exactly as expected:

if [ ! -d .git ]; then git init; fi         # Initialises a new Git repository, if doesn't already exist.
if [ ! -f README.md ]; touch README.md; fi  # Creates an empty README.md file,  if doesn't already exist.
git add -A                                  # Stages any files/directories present, in preparation to commit them to local Git repo.
git commit -m 'first commit'                # Commits the staged files/dirs to the local Git repo.
git remote add origin GIT_REMOTE_URL        # Adds the GitHub repo created above as a "Git remote" with the alias "origin".

However, I then tried copying and pasting the rendered code block from the PDF into a text file. I had been expecting the result to be exactly like the original, but instead it was as follows:

if[!-d.git];thengitinit;fi if [ ! -f README.md ]; touch README.md; fi git add -A git commit -m 'first commit' git remote add origin GIT_REMOTE_URL
# Initialises a # Creates an em # Stages any fi # Commits the s # Adds the GitH

Obviously, this is rather different to the original!

Using Acrobat Professional 8, the result is also wrong, but in a different way:

if [ ! -d .git ]; then git init; fi # Initialises a if [ ! -f README.md ]; touch README.md; fi # Creates an emgit add -A # Stages any figit commit -m 'first commit' # Commits the sgit remote add origin GIT_REMOTE_URL # Adds the GitHEN

My question is: is there a way to ensure that the original contents of every \begin{verbatim}...\end{verbatim} environment is preserved in the PDF output, not only as seen by the eye but also as "seen" by the text selection tools in PDF viewing software?

Best Answer

Since you mentioned that page on on using the listings package with PDF tagging, I thought I'd mention that I'd had a little bit more luck with getting the spacing to work -- see my PDF. As others pointed out, you still need to make sure it doesn't break the hboxes (line length). In this case I've split up the comments and made the page landscape. The method below also works with file inclusion by using \lstinputlisting{script.txt} instead of \begin{lstlisting}.

Since I am still an amateur at this kind of LaTeX voodoo, it may be that someone can make some more improvements, but I've made sure this method works with all printable ASCII characters. There are a couple of things which are not perfect, but they may not be much of a problem, or they may not be particularly difficult to fix (by someone more experienced):

  • I didn't test it with the vast number of possible listings options, so I don't know if it plays nicely or not.
  • I went to quite some effort to ensure that all special printable ASCII characters were handled properly, but I can't make any promises.
  • Handling spacing was really painful, and in the end all I could do to get it working was to replace every two spaces with a small dot from textcomp which is displayed in the PDF (it still copies as space though!) and hope that it's not too distracting. It may be possible to put some colour formatting in there to make it vanish; I don't really know. The thing is, you're only really ever likely to see this for indented code; normal text doesn't tend to have two spaces in a row.
  • I hear you ask: Since it only replaces two spaces in a row, what happens to the other spaces? Well, since it replaces two spaces at a time, even numbers of spaces are no problem. What about single spaces though? Most single spaces are not replaced but are preserved fine in the output. The two cases they are not preserved are at the very end or beginning of a line. That is, a line which ends with an odd number of spaces will lose one at the end, and a line that begins with a single space (followed immediately by a printable character) will lose one at the start.
  • Edit: Oh, I forgot to mention; I didn't figure out a way to make it copy blank lines. It's still a lot better than no copy & paste though.

\documentclass{article}
\usepackage[landscape]{geometry}
\usepackage{listings}
\usepackage{textcomp}
\usepackage[space=true]{accsupp}

\newcommand{\pdfactualhex}[3]{\newcommand{#1}{%
\BeginAccSupp{method=hex,ActualText=#2}#3\EndAccSupp{}}}

\pdfactualhex{\pdfactualdspace}{2020}{\textperiodcentered\textperiodcentered}
\pdfactualhex{\pdfactualsquote}{27}{'}
\pdfactualhex{\pdfactualbtick}{60}{`}

\lstset{tabsize=4,basicstyle=\ttfamily,columns=flexible,emptylines=10000}
\lstset{literate={'}{\pdfactualsquote}1
                 {`}{\pdfactualbtick}1
                 {\ \ }{\pdfactualdspace}2
}

\begin{document}
\begin{lstlisting}
if [ ! -d .git ]; then git init; fi         # Initialises a new Git repository,
                                            # if doesn't already exist.

if [ ! -f README.md ]; touch README.md; fi  # Creates an empty README.md file,
                                            # if doesn't already exist.

git add -A                                  # Stages any files/directories
                                            # present, in preparation to commit
                                            # them to local Git repo.

git commit -m 'first commit'                # Commits the staged files/dirs
                                            # to the local Git repo.

git remote add origin GIT_REMOTE_URL        # Adds the GitHub repo created
                                            # above as a "Git remote" with the
                                            # alias "origin".
\end{lstlisting}
\end{document}

Here's a link to my PDF output: http://goo.gl/9Ds75

Related Question