[Tex/LaTex] Do I need fontenc and inputenc

font-encodingsinput-encodings

My question feels like a duplicate of: fontenc vs inputenc, but in that question the OP uses German and non-ascii characters.

I speak and write in English exclusively. My keyboard is the standard en-GB layout making typing non-ascii characters difficult. I use latex and not xetex. My LateX files are 100% ascii. Every once in a while a citation I download into my .bib file contains a non-ascii character, but I have never had an issues "fixing" these. Do I need, or would I benefit from using, the fontenc and inputenc packages?

Best Answer

Font encoding

Because of deficiencies of the OT1 encoding I also recommend T1 font encoding:

\usepackage[T1]{fontenc}

For example OT1 contains inconsistencies in the encoding for different families, typewriter is different, … Also

\usepackage{textcomp}

is useful for getting other symbols (Euro, …). Instead of Computer Modern/EC I would use the new Latin Modern fonts that are a further development of the former:

\usepackage{lmodern}

Input encoding

Of course there are advantages using plain ASCII for the texts and using commands for the other characters. This way the text is quite independent from the encoding and can be more easily be used in different environments with different encodings.

But sometimes characters outside of ASCII might slip through. In case of OT1 the file compiles fine, but the characters are missing:

\documentclass{article}
% Default font encoding: OT1
\begin{document}
Umlauts via LICR: \"a\"o\"u\ss

Umlauts directly: äöüß
\end{document}

There is no warning or error, only the .log file says:

Missing character: There is no ä in font cmr10!
Missing character: There is no ö in font cmr10!
Missing character: There is no ü in font cmr10!
Missing character: There is no ß in font cmr10!

Switching to T1 encoding helps to make more characters available. However I do not know an editor that supports T1 encoding. Some positions match the slots in latin1, for example. But other characters are different or have other positions. The following example uses the ^^-notation to avoid problems with editing and copy&pasting, because the example uses two different encodings at the same time:

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage{textcomp}
\usepackage{lmodern}

\begin{document}

% Umlauts/Euro: äöüß/€

Umlauts/Euro via LICR: \"a\"o\"u\ss/\texteuro

Umlauts/Euro directly (latin9): ^^e4^^f6^^fc^^df/^^a4

Umlauts/Euro directly (UTF-8): ^^c3^^a4^^c3^^b6^^c3^^bc^^c3^^9f/^^e2^^82^^ac
\end{document}

This time, there is no hint even in the .log file.

Therefore I recommend using package inputenc with encoding ascii:

\usepackage[ascii]{inputenc}

Then non-ASCII input characters generate errors:

! Package inputenc Error: Keyboard character used is undefined
(inputenc)                in inputencoding `ascii'.

See the inputenc package documentation for explanation.
Type  H <return>  for immediate help.
 ...                                              

l.10 Umlauts/Euro directly (latin9): ä
                                      öüß/€

After fixing the input lines this helps to ensure that the file is indeed ASCII.

Related Solutions

[Tex/LaTex] fontenc vs inputenc

The two packages address different problems.

inputenc allows the user to input accented characters directly from the keyboard;
fontenc is oriented to output, that is, what fonts to use for printing characters.

The two packages are not connected, though it is best to call fontenc first and then inputenc.

With \usepackage[T1]{fontenc} you choose an output font encoding that has support for the accented characters used by the most widespread European languages (German, French, Italian, Polish and others), which is important because otherwise TeX would not correctly hyphenate words containing accented letters.

With \usepackage[<encoding>]{inputenc} you can directly input accented and other characters. What's important is that <encoding> matches the encoding with which the file has been written and this depends on your operating system and the settings of your text editor.

If calling only

\usepackage[T1]{fontenc}

you seem to get correct output, then your files are probably encoded with Latin-1 (also called ISO 8859-1), but beware that the correspondence is not complete: for example, typing ß you'd get SS in output, which is obviously incorrect. Thus your editor might be set up for Latin-1 and so the correct call should be

\usepackage[T1]{fontenc}
\usepackage[latin1]{inputenc}

How do the packages work? Let's do the example for these two encodings and the character ä.

First of all one must remember that TeX knows nothing about file encodings: all it really sees is the character number.

When you type ä in an editor set up for Latin-1, the machine stores character number 228.
When TeX reads the file it finds the character number 228 and the macros of inputenc transform this into \"a.
Now fontenc comes into action; the command \" has an associated table of the known accented characters the font has available, and ä is among these, so the sequence \"a is transformed into the command "print character 228" in the current (T1-encoded) font.

In this case the two coincide. This is not the case, for instance, of ß:

The machine stores character number 223
The macros of inputenc change this into \ss
fontenc transforms this into "print character 255" (where T1 encoded fonts have a ß character).

UTF-8

The situation is a bit different when \usepackage[utf8]{inputenc} is used (and the file is UTF-8 encoded, of course). When the text editor shows ä or ß, the file actually contains two byte sequences, respectively <C3><A4> and <C3><9F>

The first byte is a prefix that contains some information, the main one is that it introduces a two byte character. Now inputenc makes all legal prefixes active, so <C3> behaves like a macro; its definition is to look at the next character and then interpret, according to Unicode rules, the whole pair and transforms it into the corresponding code point, respectively U+00E4 and U+00DF.

Other prefixes announce three or four byte combinations, but the behavior is essentially the same: instead of one character more, two or three others are absorbed and the translation into a code point is performed.

In ot1enc.dfu and t1enc.dfu we find

\DeclareUnicodeCharacter{00DF}{\ss}

\DeclareUnicodeCharacter{00E4}{\"a}

Oh, wait! There's something more! Yes, in this case inputenc interacts with fontenc (which it doesn't for other input encodings): for every loaded encoding, the corresponding .dfu file (Unicode definitions) is read before the document starts. This is the reason why I prefer to always load fontenc before inputenc (though not really necessary).

Those declarations provide the necessary setup: the combinations <C3><A4> and <C3><9F> get translated into \"a and \ss respectively and everything works from now on as described for latin1.

Caveat

Here's another issue that can pop up at times (see Available Characters with iso-8859-1). The Latin-1 encoding provides at slot 0xA5 (decimal 165) the yen character. According to the description above, the latin1 option to inputenc defines the \textyen translation for this, but the T1 output encoding reserves no slot for this, so inputting ¥ results in a runtime LaTeX error. One has to load a package providing a default output for \textyen, for instance textcomp. It would be the same with the utf8 input encoding.

Only characters covered by the output encoding or that are given a suitable rendering in terms of it can be safely input.

[Tex/LaTex] LaTeX does not print words correctly: inputenc/fontenc problem

Don't use the ﬁ and ﬂ characters in the input, but write firms and fleets.

Also add the following "magic" line at the beginning of your file

% !TEX encoding = UTF-8 Unicode

This will ensure that TeXShop interprets your file as UTF-8.

If your text has already many instances of ﬁ and ﬂ, you can consider adding the following to your preamble:

\usepackage{newunicodechar}
\newunicodechar{ﬁ}{fi}
\newunicodechar{ﬂ}{fl}

but it's best to stick with normal input.

Accented characters will be treated correctly.

Here's an example:

% !TEX encoding = UTF-8 Unicode
\documentclass[a4paper,12pt]{article}
\linespread{1.5}
\usepackage[francais,english]{babel} 
\usepackage[utf8]{inputenc} 
\usepackage[T1]{fontenc} 
\usepackage[round]{natbib}
\usepackage{epigraph}
\usepackage{makeidx}
\usepackage{url}
\usepackage{color}
\usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref}
\usepackage[nottoc]{tocbibind}
\setcounter{tocdepth}{12}
\usepackage{eurosym}

\usepackage{newunicodechar}
\newunicodechar{ﬁ}{fi}
\newunicodechar{ﬂ}{fl}

\usepackage{ragged2e}

\begin{document}

 These earlier firms, were far more powerful; they commanded armies and fleets

 These earlier ﬁrms, were far more powerful; they commanded armies and ﬂeets

 Garçon, été, l'Hôpital, Génève

\bibliographystyle{plainnat}
\bibliography{biblio.bib}
\printindex
\end{document}

enter image description here