[Tex/LaTex] Issue with diacritics in Romanian language document

copy/pastefont-encodingspdfpdftex

I have a document that uses Romanian language, characters with diacritics. It is created using TexWorks.
My document looks like this:

\usepackage[utf8]{inputenc}
\usepackage[romanian]{babel}

ca răspuns la anunțul dumneavoastră

It looks fine when viewing it in a PDF reader. When I try to copy the text from PDF to notepad, it outputs:

ca r˘ aspuns la anunt , ul

This is copied instead of the original input. Can someone please shed some light as to why it does this?

Best Answer

There is currently no font encoding fully supporting Romanian, I'm afraid. New fonts should be created: not from scratch, but it would be a big job and you'd not have many of them available anyway.

The problem is that in the standard encoding OT1, no accented letter is present; with T1 the situation is slightly better because ÂâÎîĂă are present as precomposed characters, but ȘșȚț aren't, so they have to be built from other pieces.

This has two bad consequences: Romanian words cannot be hyphenated properly beyond ȘșȚț and the characters cannot be safely copied from the PDF. A not really easy workaround for copying is available with the accsupp package, which however only works with Adobe Acrobat and no other viewer that I know of.

The T1 encoding unfortunately has ŞşŢţ which are not good enough for good Romanian typography. The reason is historical: when the T1 encoding was decided upon, support for Turkish was easy to add and the language has Şş; the “t” with the cedilla was added (and it was wrong to begin with, in my opinion).

With XeLaTeX/LuaLaTeX the situation is much better: many fonts support the Romanian letters ŞşŢţ as well as the other ones. The following works out of the box with XeLaTeX/LuaLaTeX:

\documentclass{article}
\usepackage[romanian]{babel}
\usepackage{fontspec}

\begin{document}

ca răspuns la anunțul dumneavoastră

\end{document}

enter image description here

and copy/paste from the PDF gives

ca răspuns la anunțul dumneavoastră