I just don't seem to get it. So I come from the german speaking part of Switzerland and naturally I need ä ü ö a lot when I write german texts. So in LaTeX I have the option to go via
"a "u "o
which isn't really comfortable, or I can use an additional package. So now the confusion starts.
I can use either
\usepackage[T1]{fontenc}
or
\usepackage[latin1]{inputenc}
So now if I create a new document and save it as a latin-1 encoded .tex file it seems it doesn't matter which option I chose they both work. Now why should I pick one over the other or use both? Which problems could occur (Do they occur if I use the same .tex file on another OS depending on which option I have chosen?)?
If I save the .tex encoded as lets say the mac os latin then the first option compiles but shows the wrong characters for the ä ü ö's.
Is it then like latex maps the ä according to the latin1 to a character and since the document is encoded as something different than latin1 then gets encoded to something different again?
So if the last 2 paragraphs have been confusing (mainly because I'm confused) the main question is: What does fontenc what inputenc doesn't do and vice versa?
Best Answer
The two packages address different problems.
inputenc
allows the user to input accented characters directly from the keyboard;fontenc
is oriented to output, that is, what fonts to use for printing characters.The two packages are not connected, though it is best to call
fontenc
first and theninputenc
.With
\usepackage[T1]{fontenc}
you choose an output font encoding that has support for the accented characters used by the most widespread European languages (German, French, Italian, Polish and others), which is important because otherwise TeX would not correctly hyphenate words containing accented letters.With
\usepackage[<encoding>]{inputenc}
you can directly input accented and other characters. What's important is that<encoding>
matches the encoding with which the file has been written and this depends on your operating system and the settings of your text editor.If calling only
you seem to get correct output, then your files are probably encoded with Latin-1 (also called ISO 8859-1), but beware that the correspondence is not complete: for example, typing
ß
you'd getSS
in output, which is obviously incorrect. Thus your editor might be set up for Latin-1 and so the correct call should beHow do the packages work? Let's do the example for these two encodings and the character
ä
.First of all one must remember that TeX knows nothing about file encodings: all it really sees is the character number.
When you type
ä
in an editor set up for Latin-1, the machine stores character number 228.When TeX reads the file it finds the character number 228 and the macros of
inputenc
transform this into\"a
.Now
fontenc
comes into action; the command\"
has an associated table of the known accented characters the font has available, andä
is among these, so the sequence\"a
is transformed into the command "print character 228" in the current (T1-encoded) font.In this case the two coincide. This is not the case, for instance, of
ß
:The machine stores character number 223
The macros of
inputenc
change this into\ss
fontenc
transforms this into "print character 255" (where T1 encoded fonts have a ß character).UTF-8
The situation is a bit different when
\usepackage[utf8]{inputenc}
is used (and the file is UTF-8 encoded, of course). When the text editor showsä
orß
, the file actually contains two byte sequences, respectively<C3><A4>
and<C3><9F>
The first byte is a prefix that contains some information, the main one is that it introduces a two byte character. Now
inputenc
makes all legal prefixes active, so<C3>
behaves like a macro; its definition is to look at the next character and then interpret, according to Unicode rules, the whole pair and transforms it into the corresponding code point, respectively U+00E4 and U+00DF.Other prefixes announce three or four byte combinations, but the behavior is essentially the same: instead of one character more, two or three others are absorbed and the translation into a code point is performed.
In
ot1enc.dfu
andt1enc.dfu
we findOh, wait! There's something more! Yes, in this case
inputenc
interacts withfontenc
(which it doesn't for other input encodings): for every loaded encoding, the corresponding.dfu
file (Unicode definitions) is read before the document starts. This is the reason why I prefer to always loadfontenc
beforeinputenc
(though not really necessary).Those declarations provide the necessary setup: the combinations
<C3><A4>
and<C3><9F>
get translated into\"a
and\ss
respectively and everything works from now on as described forlatin1
.Caveat
Here's another issue that can pop up at times (see Available Characters with iso-8859-1). The Latin-1 encoding provides at slot
0xA5
(decimal 165) the yen character. According to the description above, thelatin1
option toinputenc
defines the\textyen
translation for this, but the T1 output encoding reserves no slot for this, so inputting¥
results in a runtime LaTeX error. One has to load a package providing a default output for\textyen
, for instancetextcomp
. It would be the same with theutf8
input encoding.Only characters covered by the output encoding or that are given a suitable rendering in terms of it can be safely input.