I have a document that uses Romanian language, characters with diacritics. It is created using TexWorks.
My document looks like this:
\usepackage[utf8]{inputenc}
\usepackage[romanian]{babel}
ca răspuns la anunțul dumneavoastră
It looks fine when viewing it in a PDF reader. When I try to copy the text from PDF to notepad, it outputs:
ca r˘ aspuns la anunt , ul
This is copied instead of the original input. Can someone please shed some light as to why it does this?
Best Answer
There is currently no font encoding fully supporting Romanian, I'm afraid. New fonts should be created: not from scratch, but it would be a big job and you'd not have many of them available anyway.
The problem is that in the standard encoding
OT1
, no accented letter is present; withT1
the situation is slightly better becauseÂâÎîĂă
are present as precomposed characters, butȘșȚț
aren't, so they have to be built from other pieces.This has two bad consequences: Romanian words cannot be hyphenated properly beyond
ȘșȚț
and the characters cannot be safely copied from the PDF. A not really easy workaround for copying is available with theaccsupp
package, which however only works with Adobe Acrobat and no other viewer that I know of.The
T1
encoding unfortunately hasŞşŢţ
which are not good enough for good Romanian typography. The reason is historical: when theT1
encoding was decided upon, support for Turkish was easy to add and the language hasŞş
; the “t” with the cedilla was added (and it was wrong to begin with, in my opinion).With XeLaTeX/LuaLaTeX the situation is much better: many fonts support the Romanian letters
ŞşŢţ
as well as the other ones. The following works out of the box with XeLaTeX/LuaLaTeX:and copy/paste from the PDF gives