[Tex/LaTex] Embedded fonts with/without fontenc/inputenc

accentsfont-encodingsfonts

Well, let me explain. I tried to select a part of a text on a PDF file to copy and paste. I noticed that accented letters disappear when selected, as the image below shows.

enter image description here

So, I decided to see the properties of the PDF file to check the used fonts and I found this:

enter image description here

The PDF was created with pdflatex on TeXLive 2011 and the MWE is:

\documentclass[11pt,a4paper]{report}
\usepackage[latin1]{inputenc}

\begin{document}\thispagestyle{empty}
dimensões 
\end{document}

The PDF viewer is evince 3.2.1.

The use of latin1 instead of utf8 is that utf8 does not allow me to insert accented characters, like õ. It produces the error

inputenc: Keyboard character used is undefined(inputenc) in inputencoding `utf8'. dimensõ

Loading fontenc

So I'm using latin1 and also I loaded the fontenc package with T1 option. The output is OK and now I can select the accented letters, as the image shows:

enter image description here

Then I decided to check PDF properties again and here is the surprising fact: the embedded font is another font! Why??

enter image description here

Best Answer

Assuming, to begin with, that you don't want bitmap fonts embedded in the PDF, here are some facts about the problem. I'll deal with pdfLaTeX, for XeLaTeX or LuaLaTeX with fontspec it's a different matter.

  1. With the default OT1 encoding, accents are realized by combining two characters, which makes impossible doing "copy-paste".

  2. The font must be available in .pfb (or .pfa) format.

  3. In order to do "copy-paste" from the PDF, the font should also have a correct correspondence between the glyphs and their names.

The link between a TeX font and its Type1 counterpart is provided by the pdftex.map file. When you use the default output encoding and Computer Modern fonts, the relevant line in pdftex.map is

cmr10 CMR10 <cmr10.pfb

The first column is the TeX font name, the second is the PostScript name found in the loaded file, which is cmr10.pfb. Note that when using 11pt type you really are using the scaled 10 point font.

When the T1 output encoding is chosen, instead of the Computer Modern fonts, an extension with accented characters is used: the European Modern fonts. They are not exactly alike CM fonts, but for practical purposes we assume they are. The relevant line in pdftex.map is

ecrm1095 SFRM1095 " T1Encoding ReEncodeFont " <cm-super-t1.enc <sfrm1095.pfb

which is more complex than the other one. The sfrm1095.pfb font file indeed contains a huge number of glyphs: it is used also for the TS1, T2A, T2B, T2C and X2 encodings (text companion and Cyrillic fonts). Thus only a part of it must be picked up, which is done by the ReEncodeFont instruction.

These Type1 counterparts for the European Modern fonts are provided by the so-called CM-Super fonts, that are not included in minimal distributions. So if you want that people can compile the same TeX document with the same result, ensure they have the (meta)package from their TeX distribution.

An alternative is using Latin Modern fonts. When you have a document such as

\documentclass[11pt,a4paper]{report}
\usepackage[T1]{fontenc}
\usepackage[latin1]{inputenc}
\usepackage{lmodern}

\begin{document}
\thispagestyle{empty}
dimensões 
\end{document}

the Type1 font will be chosen according to the line

ec-lmr10 LMRoman10-Regular " enclmec ReEncodeFont " <lm-ec.enc <lmr10.pfb

Without the fontenc package, the font will be given by

rm-lmr10 LMRoman10-Regular " enclmrm ReEncodeFont " <lm-rm.enc <lmr10.pfb

In the lm-rm.enc file also glyphs in the "upper half" of the font table are defined, but the correspondence is only similar to the Latin-1 encoding.

If you plan to use accented characters in your TeX input file, always add the corresponding call of inputenc and the correct call of fontenc. Otherwise you might get surprising results, as the following MWE shows (note the commented out lines):

% -*- coding: latin-1 -*-
\documentclass[11pt,a4paper]{report}

%\usepackage[T1]{fontenc}
%\usepackage[latin1]{inputenc}

\usepackage{lmodern}

\begin{document}
\thispagestyle{empty}

dimensões 

«straße»
\end{document}

enter image description here

You'd get the same by uncommenting only the fontenc line.

Related Question