[Tex/LaTex] Why is caron treated differently than breve

biblatexunicode

I'm using biblatex and wanted to add the author Kruzhkov into my Bibliography. As far as I see he is written Kru\u{z}kov

Writing him that way leads to an error as here: inputenc Error: Unicode char \u8: not set up for use with LaTeX

If I write Kru\v{z}kov instead everything works fine.
I'm using the inputenc package with [utf8] option.
Any ideas?

Best Answer

MWE and test file that shows the problem, if compiled without LuaTeX/XeTeX:

\documentclass{article}

\usepackage{ifluatex,ifxetex}
\ifluatex
  \usepackage{fontspec}
\else
  \ifxetex
    \usepackage{fontspec}
  \else
    \usepackage[utf8]{inputenc}
  \fi
\fi

\usepackage[
  backend=biber,
]{biblatex}

\begin{filecontents}{\jobname.bib}
@book{foobar,
  author         = {Kru\u{z}kov and Kru\v{z}kov},
  title          = {About foobar},
  year           = {1970},
}
\end{filecontents}

\addbibresource{\jobname.bib}

\begin{document}
\cite{foobar}
\printbibliography
\end{document}

The program biber normalizes the data input to NFD UTF-8, where all accented characters are decomposed. From its documentation:

3.2 Unicode

Biber uses NFD UTF-8 internally. All data is converted to NFD UTF-8 when read. If UTF-8 output is requested (to .bbl for example), the UTF-8 will always be > NFC.

In the final file \jobname.bib, the decompositions are replaced by equivalent precomposed characters, if these exist.

 LaTeX → internal biber (NFD UTF-8) → output of biber (NFC UTF-8)

\v{z}U+007A (z) U+030C (combining caron) → U+017E (latin small letter z with caron)

\u{z}U+007A (z) U+0306 (combining breve) → U+007A U+0306

Thus \u{z} remains decomposed and this is a serious problem, because TeX cannot handle combining accents easyly, if they are following the symbol. At this time the accent is seen, the symbol is usually already set and the accent cannot modify the base symbol any more. Even worse, it does not even know the base symbol.

Package ucs can handle combining accents to some degree by looking ahead for combining accents. But this package is not compatible with package biblatex. Also it could get \u{z} working, probably because a precomposed character does not exist for it.

LaTeX's utf8.def for package inputenc cannot handle them.

The following options remain:

  • Using \v{z} instead of \u{z}, probably the correct spelling anyway according to the comments.

  • The example runs with LuaTeX and XeTeX that can handle the Unicode combining accents.

  • Option safeinputenc for package biblatex, see pst's answer.

  • Using bibtex instead of biber as backend.

Result

Related Question