[Tex/LaTex] Problem copying text from pdf – spaces being stripped

copy/pasteinput-encodingspdfspacing

I have the following minimal working example:

\documentclass{book}

\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}

\begin{document}

é canção

\end{document}

I need to use the utf8 argument in order to have special characters in a simpler way and I need the T1 argument in order to copy the special characters correctly from pdf.

The problem is, when I copy the text from the pdf reader (I am using Foxit Reader) the space won't come out with the text, resulting in écanção being copied (notice that the space didn't come out with the text).

How to solve this?

Best Answer

The original purpose of PDF was to represent printed documents, and there was no explicit way of showing a space character. With modern developments around PDF, people are interested in things like automatic reflow for small screens and structural information for document processing or interfacing to screenreaders for people with visual impairment. Due to this, it is now possible to represent the spaces explicitly (I believe it is even a requirement for some grade of PDF/A compliance). There is a patch for pdftex here after which I believe you are supposed to add the following to your tex file:

\pdfmapline{+dummy-space <dummy-space.pfb}
\pdfgeninterwordspace

I don't know if the patch still applies (the bug tracker claims it has been replaced by a branch, probably this one), and I haven't tested it to see if it actually solves the problem.