[Tex/LaTex] Biber writes incorrect unicode encoding of \’\i

biberunicode

Biber seems to be writing some strange unicode characters when reading bib entries where the accented character í has been entered as \'\i. This causes compilation to break when using pdflatex (and utf8 input), and produce strange characters or random errors with some fonts in xelatex. How can I get biber to produce the correct character?

Minimal example bellow

\documentclass{article}

\begin{filecontents}{test.bib}
@article{sample,
    Author = {Ver{\'o}nica Mac{\'\i}as},
    Title = {My Sample Paper},
    Journal = {Journal of Sample Documents},
    Year = {2011}}
\end{filecontents}

\usepackage{iftex}
\ifXeTeX
  \usepackage{fontspec}
  \setromanfont[Mapping=tex-text]{Minion Pro}
\else
  \usepackage[utf8]{inputenc}
\fi

\usepackage[backend=biber]{biblatex}
\addbibresource{test.bib}

\begin{document}
Verónica Macías \cite{sample}
\printbibliography
\end{document}

If I run

pdflatex test
biber test
pdflatex test

I get the error

! Package inputenc Error: Unicode char \u8:́ not set up for use with LaTeX.

If instead (after cleaning auxiliary files) I run:

xelatex test
biber test
xelatex test
xelatex test

Then I would randomly get either

** ERROR ** Charstring too long: gid=1679

Or compilation succeeds, but the output looks like

enter image description here

How can I convince biber to produce the correct unicode character for í?

P.S. Yes, I know that I could solve the problem by changing the encoding of the bib file, but assume that I don't want to do this because that file is being automatically generated elsewhere, and that's what I get.

Best Answer

Use the correct syntax:

Author = {Ver{\'{o}}nica Mac{\'{i}}as}

Or directly

Author = {Verónica Macías}

ensuring that the file is UTF-8 encoded. Works both in pdflatex and xelatex. The \i after the main accents is not required in LaTeX since several years, but the braces around the accented letters is BibTeX syntax from the origins.

If it's really not possible to modify the documents, the following hacks seem to work

XeLaTeX:

\begingroup\lccode`\~=\string"0131
  \lowercase{\endgroup\protected\def~}#1{\char\string"00ED}
\catcode\string"0131=\active

or, more simply,

\catcode`ı=13 \protected\defı#1{í}

PdfLaTeX:

\DeclareUnicodeCharacter{0131}{í}
\DeclareUnicodeCharacter{0301}{}

More complicated things will be needed if you have also \`\i somewhere.