[Tex/LaTex] Text copied from pdf is missing spaces, or has extra ones

copy/pastenewtxpdftexspacing

When I create a pdf with pdflatexand copy text from that pdf (using Adobe Reader DC on Windows 10), some of the spaces are missing. Here's an MWE:

\documentclass{article}
\usepackage{newtxtext}
\begin{document}
    Therefore, this work ... \hspace*{\linewidth}
\end{document}

When I copy text from that pdf, this is what I get (1 being the page number):

Therefore, thiswork ...
1

Removing the \hspace*, OR removing newtxtext (or both) fixed the problem, but that's not I want, of course (as \hspace* represents some text following "this work").

I have come across Problem copying text from pdf – spaces being stripped and XeLaTeX and missing spaces in PDF text, which proposed \pdfgeninterwordspace, which is now \pdfinterwordspaceon (thanks, @egreg). So I tried that:

\documentclass{article}
\usepackage{newtxtext}
\pdfmapline{+dummy-space <dummy-space.pfb}
\pdfinterwordspaceon
\begin{document}
    Therefore, this work ... \hspace*{\linewidth}
\end{document}

(See Use pdfinterwordspaceon with pdflatex from MiKTeX on Windows if that does not compile for you.)

Now, when I copy text from that pdf, I get this:

Therefore,  this work  ... 
1

So basically, additional space has been introduced regardless of whether or not it was needed. Yes, the missing space in "thiswork" has been added, which is good; but so have three extra spaces after "Therefore,", "work", and "…", which is not good.

Is there a better solution? Am I using \pdfinterwordspaceon correctly?

Best Answer

This is at least partly a known issue with Adobe Reader. Adobe Reader fails to recognise spaces between words in certain cases (e.g. where the spacing is smaller than average) or recognises one space as multiple spaces (e.g. where the spacing is larger than average).

It is an issue in the viewer, not the file - as demonstrated by the fact that other viewers work fine - and there's not much to be done on the TeX side of things.