[Tex/LaTex] How to write “ä” and other umlauts and accented letters in bibliography


How to write in bibliography (package natbib) letter "a" with two dots above? Specially, I mean the word Birkhäuser.

Is there a general rule or way how to write such umlauts or other accented letters in bibliographies?

Best Answer

To typeset accented characters inside bibliography fields for processing with BibTeX, encase them in curly braces. To list but a few accented characters:

{\"a} {\^e} {\`i} {\.I} {\o} {\'u} {\aa} {\c c} {\u g} {\l} {\~n} {\H o} {\v r} {\ss} {\r u}

enter image description here

The word Birkhäuser should therefore be entered as Birkh{\"a}user.

Just to provide a somewhat more involved case: the name Jaromír Kovářík should be entered as either Jarom{\'i}r Kov{\'a}{\v r}{\'i}k or, more succinctly, Jarom{\'i}r Kov{\'a\v r\'i}k. As is explained in greater detail below, BibTeX will then sort the surname Kovářík as if it were spelled Kovarik, i.e., without any "accented characters". Replacing the accented characters in Kovářík with unaccented characters matters if the bibliography's entries are sorted alphabetically by authors' surnames and if the bibliography contains entries with the surnames Kovářík, Kovács, Kowalski, and Kowatski...

Addendum: There is an obvious follow-up question to the "How does one enter a special character for use in BibTeX?" question: Why is it necessary to encase these "special characters" in this manner? Or: Why are the ordinary methods of entering these characters in a LaTeX document -- say, \"{a} or \"a, let alone ä -- not quite right for BibTeX?

There are two separate reasons for this requirement.

  1. If you use double-quotes, i.e., " ... ", to delimit the contents of a bibliographic field, you will find that writing
    author = "Anna H\"{a}user",

generates a BibTeX error, whereas

    author = "Anna H{\"a}user",

does not. I.e., BibTeX isn't quite smart enough on its own to distinguish between the two uses of the " character and needs extra help.

  1. In addition, contents of bibliographic fields -- certainly the author and editor fields, but potentially other fields as well, including the title, booktitle, and organization fields -- are frequently used to sort entries alphabetically.

How do BibTeX (and LaTeX) sort characters with Umlaute, diacritics, and other special features relative to the basic 26 characters of the Latin alphabet? How is one supposed to sort three authors named, say, Peter Hauser, Anna Häuser, and John Hill? For some pretty sound reasons -- but which are way too ancient and obscure to go into any adequate level of detail here; to explore these reasons properly, it's crucial to have Appendix C of the TeXBook handy... -- a decision was made in the design of BibTeX to "purify" (the BibTeX function that does this job really is called purify$!) the contents of various fields as follows (this method conforms, probably not surprisingly, to US and UK sorting criteria; it needn't be "correct" outside of English-speaking regions, as I will note below) for sorting purposes:

  • {\"a}, {\'a}, {\^a}, etc are all made equivalent to a,
  • {\"o}, {\'o}, {\H o} and {\o} are all made equivalent to o,
  • {\l} and {\L} become equivalent to l and L, respectively,
  • {\ss} becomes equivalent to ss,
  • {\aa} becomes equivalent to aa,
  • and so on for all other "accented" characters,
  • finally, any characters that do not fit into this scheme, including ä, are moved to the very end, i.e., after z. This may seem arbitrary and ill-informed from today's vantage point, but back when BibTeX was created more than 20 years ago the only relevant character encoding and sorting system was ASCII.

As you can immediately appreciate, this "purification" step is greatly simplified and made more robust if the "accented" characters are all entered consistently in the manner suggested in the first part of this answer.

Turning to the earlier case of the three authors named Peter Hauser, Anna Häuser, and John Hill: How will they appear in a bibliography whose entries are sorted alphabetically by the authors' surnames? If Anna's last name is entered as H{\"a}user, the three authors will end up being listed as Häuser, A. - Hauser, P. - Hill, J.. In contrast, if Anna's last name had been entered as Häuser, the sorting order would have been Hauser - Hill - Häuser. For most English-speaking readers, the second ordering will look completely wrong.

Some specialists from, say, Sweden, may object that this approach to sorting characters that aren't among the basic 26 characters of the Latin alphabet doesn't meet the specific national standards of, say, Sweden. [I obviously don't mean to pick on any Swedes. I mention them because I remember having read somewhere that in the Swedish alphabet, ä does come after z and hence is definitely not equivalent (not even for sorting purposes!) to a.] My answer to this objection is: If you're a Swedish author writing in Swedish for a Swedish target audience, you had better conform to specific Swedish customs. On the other hand, if you're a Swede writing in English in a journal that's exclusively published in English, it'll do you no good at all if you try to insist on obeying Swedish sorting customs in your paper's bibliography. Of course, the very inability of BibTeX to be easily adaptable to non-English sorting customs is one of the reasons for the development and adoption of BibLaTeX and Biber. However, that's a topic for another day, isn't it?

The issue of how BibTeX sorts bibliographic entries (as well as many other fascinating [!] issues) is examined at length and explained admirably in the surprisingly readable (given the enormous dryness of the subject!) essay Tame the BeaST by Nicolas Markey. If you have TeXLive or MikTeX as your TeX distribution, you can also access this document by typing "texdoc tamethebeast" at a command prompt.

For the sake of completeness and replicability, here's the MWE that gives to the screenshot shown above. Note that it's not necessary to load any extra packages to typeset the accented characters considered in this example. However, assuming you use pdfLaTeX to compile your document, you will need to load the fontenc package with the option T1 if you need to typeset, say, an ogonek-accented character, such as {\k a}, or the Icelandic "thorn", {\th}.

{\"a} {\^e} {\`i} {\.I} {\o} {\'u} {\aa} {\c c} {\u g} {\l} {\~n} {\H o} {\v r} {\ss} {\r u}
Related Question