The two packages address different problems.
inputenc
allows the user to input accented characters directly from the keyboard;
fontenc
is oriented to output, that is, what fonts to use for printing characters.
The two packages are not connected, though it is best to call fontenc
first and then inputenc
.
With \usepackage[T1]{fontenc}
you choose an output font encoding that has support for the accented characters used by the most widespread European languages (German, French, Italian, Polish and others), which is important because otherwise TeX would not correctly hyphenate words containing accented letters.
With \usepackage[<encoding>]{inputenc}
you can directly input accented and other characters. What's important is that <encoding>
matches the encoding with which the file has been written and this depends on your operating system and the settings of your text editor.
If calling only
\usepackage[T1]{fontenc}
you seem to get correct output, then your files are probably encoded with Latin-1 (also called ISO 8859-1), but beware that the correspondence is not complete: for example, typing ß
you'd get SS
in output, which is obviously incorrect. Thus your editor might be set up for Latin-1 and so the correct call should be
\usepackage[T1]{fontenc}
\usepackage[latin1]{inputenc}
How do the packages work? Let's do the example for these two encodings and the character ä
.
First of all one must remember that TeX knows nothing about file encodings: all it really sees is the character number.
When you type ä
in an editor set up for Latin-1, the machine stores character number 228.
When TeX reads the file it finds the character number 228 and the macros of inputenc
transform this into \"a
.
Now fontenc
comes into action; the command \"
has an associated table of the known accented characters the font has available, and ä
is among these, so the sequence \"a
is transformed into the command "print character 228" in the current (T1-encoded) font.
In this case the two coincide. This is not the case, for instance, of ß
:
The machine stores character number 223
The macros of inputenc
change this into \ss
fontenc
transforms this into "print character 255" (where T1 encoded fonts have a ß character).
UTF-8
The situation is a bit different when \usepackage[utf8]{inputenc}
is used (and the file is UTF-8 encoded, of course). When the text editor shows ä
or ß
, the file actually contains two byte sequences, respectively <C3><A4>
and <C3><9F>
The first byte is a prefix that contains some information, the main one is that it introduces a two byte character. Now inputenc
makes all legal prefixes active, so <C3>
behaves like a macro; its definition is to look at the next character and then interpret, according to Unicode rules, the whole pair and transforms it into the corresponding code point, respectively U+00E4 and U+00DF.
Other prefixes announce three or four byte combinations, but the behavior is essentially the same: instead of one character more, two or three others are absorbed and the translation into a code point is performed.
In ot1enc.dfu
and t1enc.dfu
we find
\DeclareUnicodeCharacter{00DF}{\ss}
\DeclareUnicodeCharacter{00E4}{\"a}
Oh, wait! There's something more! Yes, in this case inputenc
interacts with fontenc
(which it doesn't for other input encodings): for every loaded encoding, the corresponding .dfu
file (Unicode definitions) is read before the document starts. This is the reason why I prefer to always load fontenc
before inputenc
(though not really necessary).
Those declarations provide the necessary setup: the combinations <C3><A4>
and <C3><9F>
get translated into \"a
and \ss
respectively and everything works from now on as described for latin1
.
Caveat
Here's another issue that can pop up at times (see Available Characters with iso-8859-1). The Latin-1 encoding provides at slot 0xA5
(decimal 165) the yen character. According to the description above, the latin1
option to inputenc
defines the \textyen
translation for this, but the T1 output encoding reserves no slot for this, so inputting ¥
results in a runtime LaTeX error. One has to load a package providing a default output for \textyen
, for instance textcomp
. It would be the same with the utf8
input encoding.
Only characters covered by the output encoding or that are given a suitable rendering in terms of it can be safely input.
Best Answer
Assuming, to begin with, that you don't want bitmap fonts embedded in the PDF, here are some facts about the problem. I'll deal with pdfLaTeX, for XeLaTeX or LuaLaTeX with
fontspec
it's a different matter.With the default OT1 encoding, accents are realized by combining two characters, which makes impossible doing "copy-paste".
The font must be available in
.pfb
(or.pfa
) format.In order to do "copy-paste" from the PDF, the font should also have a correct correspondence between the glyphs and their names.
The link between a TeX font and its Type1 counterpart is provided by the
pdftex.map
file. When you use the default output encoding and Computer Modern fonts, the relevant line inpdftex.map
isThe first column is the TeX font name, the second is the PostScript name found in the loaded file, which is
cmr10.pfb
. Note that when using 11pt type you really are using the scaled 10 point font.When the T1 output encoding is chosen, instead of the Computer Modern fonts, an extension with accented characters is used: the European Modern fonts. They are not exactly alike CM fonts, but for practical purposes we assume they are. The relevant line in
pdftex.map
iswhich is more complex than the other one. The
sfrm1095.pfb
font file indeed contains a huge number of glyphs: it is used also for the TS1, T2A, T2B, T2C and X2 encodings (text companion and Cyrillic fonts). Thus only a part of it must be picked up, which is done by theReEncodeFont
instruction.These Type1 counterparts for the European Modern fonts are provided by the so-called CM-Super fonts, that are not included in minimal distributions. So if you want that people can compile the same TeX document with the same result, ensure they have the (meta)package from their TeX distribution.
An alternative is using Latin Modern fonts. When you have a document such as
the Type1 font will be chosen according to the line
Without the
fontenc
package, the font will be given byIn the
lm-rm.enc
file also glyphs in the "upper half" of the font table are defined, but the correspondence is only similar to the Latin-1 encoding.If you plan to use accented characters in your TeX input file, always add the corresponding call of
inputenc
and the correct call offontenc
. Otherwise you might get surprising results, as the following MWE shows (note the commented out lines):You'd get the same by uncommenting only the
fontenc
line.