[Tex/LaTex] Unicode -(U+301) error in biblatex, but not in main text: {\'{\i}}

biblatexlatexmkunicode

When compiling my document embedding references using biblatex, I get the error message:

Package inputenc Error: Unicode char ́ (U+301)(inputenc) not set up
for use with LaTeX

With the help of the various unicode/biblatex questions on this side, I identified the character {\'{\i}} in one of the references as the culprit. Interestingly, setting {\'{\i}} in the main text does not throw an error message:

\begin{filecontents}{biblio.bib}
    @Article{Zheng2016,
        %author    = {Qinsi Zheng and Steffen Jockusch and Gabriel G. Rodr{\'{\i}}guez-Calero and Zhou Zhou and Hong Zhao and Roger B. Altman and H{\'e}ctor D. Abru{\~n}a and Scott C. Blanchard},
        author    = {Qinsi Zheng and Gabriel G. Rodr{\'i}guez-Calero and Steffen Jockusch and Zhou Zhou and Hong Zhao and Roger B. Altman and H{\'e}ctor D. Abru{\~n}a and Scott C. Blanchard},
        title     = {Intra-molecular triplet energy transfer is a general approach to improve organic fluorophore photostability},
        journal   = {Photochemical {\&} Photobiological Sciences},
        year      = {2016},
        volume    = {15},
        number    = {2},
        pages     = {196--203},
        doi       = {10.1039/c5pp00400d},
        publisher = {Royal Society of Chemistry ({RSC})},
    }

    @Article{Pennacchietti2018,
        author    = {Francesca Pennacchietti and Ekaterina O. Serebrovskaya and Aline R. Faro and Irina I. Shemyakina and Nina G. Bozhanova and Alexey A. Kotlobay and Nadya G. Gurskaya and Andreas Bod{\'{e}}n and Jes Dreier and Dmitry M. Chudakov and Konstantin A. Lukyanov and Vladislav V. Verkhusha and Alexander S. Mishin and Ilaria Testa},
        title     = {Fast reversibly photoswitching red fluorescent proteins for live-cell {RESOLFT} nanoscopy},
        journal   = {Nature Methods},
        year      = {2018},
        volume    = {15},
        number    = {8},
        month     = {jul},
        pages     = {601--604},
        doi       = {10.1038/s41592-018-0052-9},
        publisher = {Springer Nature America, Inc},
    }
\end{filecontents}



\documentclass[pdfa,a4paper,11pt,
                        bibliography=totoc,
                        numbers=noenddot,
                        abstracton,
                        twoside,openright,
                        parskip=half]{scrartcl}

\usepackage[english]{babel} % provides the dictionary for proper hyphenation
\frenchspacing % single space after full stop
\raggedbottom
\usepackage[utf8]{inputenc} % for font encoding

\usepackage{filecontents}

\usepackage{csquotes} % needed for babel / polyglossia
\usepackage[
natbib = true, % allows usage of citet, citep etc. commands
citestyle = authoryear, bibstyle = authoryear, %
backend = biber, %
sortcites = true, % sorts multiple refs in one cite command
hyperref = true, %backref = true, %
giveninits = true, %
terseinits = false, % if true: D. E. => DE
%uniquelist = true,
maxbibnames = 30, maxcitenames = 2, %
uniquename = init, uniquelist = minyear, % uniquelist = minyear only cites 2nd author if first author and year are identical
date = year,
url = false, isbn = false]{biblatex} % package for the bibliography
\addbibresource{biblio.bib}
\usepackage{hyperref} % crossreferencing

\begin{document}

\section{Introduction}
\citep{Zheng2016}
\citep{Pennacchietti2018}

S\'{\i}

\printbibliography

\end{document}

Trying to solve the problem, I I found different attempts on this side, such as

  • using {\'i} as suggested in this answer works. However, for automatically imported bibliography entries, it's tedious to find all of offending characters, especially when the error might occur with different combinations of precomposed characters as suggested here.

  • I therefore tried to configure biblatex using the --output-safechars option as suggested in in this answer. Compiling manually from the terminal, it seems to work ok.

  • However, I prefer to use latexmk (especially when compilation workflows require multiple runs of various compilers) for compilation. I then found this answer, explaining how to pass biber options to latexmk. I created the file latexmk in the local directory, containing the line $biber='biber --output-safechars';. This finally works.

I am however afraid, that this whole workflow is beyond my bosses willingness to put up with the perks of LaTeX.

So I guess I'm having two options here:

1) is there any way to remove the offending characters automatically? I found this answer, but am afraid that it's way beyond my understanding.

2) if there isn't, is there any way to force latexmk/biber to compile such characters properly that does not require any additional files or setup? Ideally, I'm looking for some magic commands that I could "sneak in unnoticed" at the beginning of the .tex file.

Edit:
I just tested the workflow using the .latexmkrc on my whole document, which now throws an error

Undefined control sequence.
in the line just after the \printbibliography command. Apparently some entry in my 200+ bibliography clashes with the --output-safechars option.

I'll research on it, but it seems this workflow might also not work for me in the end.

Best Answer

With biblatex and Biber the best solution™ is of course to use the correct Unicode characters (and ideally the precomposed characters: Åström, not a combination of the combining characters: Åström) in the source.

author    = {Qinsi Zheng and Steffen Jockusch and Gabriel G. Rodríguez-Calero
             and Zhou Zhou and Hong Zhao and Roger B. Altman and Héctor D. Abruña
             and Scott C. Blanchard},

The benefit of this solution is that it is easier to read, just works and avoids the additional braces that BibTeX needs (and that are retained in Biber for simplicity and backwards compatibility, those braces could destroy kerning and are otherwise unnecessary for Biber, see How to write “ä” and other umlauts and accented letters in bibliography? for why they are needed for BibTeX).


If that is not possible and you can't replace {\'{\i}} with {\'i} in the source, you can try a sourcemap as shown in PLK's answer to Input encoding error after upgrading from Biber 1.9 to Biber 2.1.

The logistic drawback of that approach is that you need to add a substitution rule for every possible problematic combination.

To offer some additional benefit over PLK's answer, the code below uses the new loop functionality to replace \`{\i}, \'{\i}, \^{\i} and \"{\i} (all Latin-1 dotless-i combinations) for (hopefully) all fields where it makes sense.

\documentclass{article}
\usepackage[english]{babel}
\usepackage[utf8]{inputenc}
\usepackage{csquotes}
\usepackage[style = authoryear, backend = biber, maxbibnames=999]{biblatex}
\addbibresource{\jobname.bib}

\DeclareDatafieldSet{setall}{
  \member[datatype=literal]
  \member[datatype=name]
  \member[field=journal]% journal is special since it is
                        % actually journaltitle
}

\DeclareSourcemap{
  \maps[datatype=bibtex]{
    \map[overwrite, foreach={setall}]{
      % \`{\i}
      \step[fieldsource=\regexp{$MAPLOOP},
            match=\regexp{\x{0131}\x{0300}},
            replace=\regexp{\x{00EC}}]
      % \'{\i}
      \step[fieldsource=\regexp{$MAPLOOP},
            match=\regexp{\x{0131}\x{0301}},
            replace=\regexp{\x{00ED}}]
      % \^{\i}
      \step[fieldsource=\regexp{$MAPLOOP},
            match=\regexp{\x{0131}\x{0302}},
            replace=\regexp{\x{00EE}}]
      % \"{\i}
      \step[fieldsource=\regexp{$MAPLOOP},
            match=\regexp{\x{0131}\x{0308}},
            replace=\regexp{\x{00EF}}]
    }
  }
}

\usepackage{filecontents}
\begin{filecontents}{\jobname.bib}
@article{itest,
  author  = {Lo{\"{\i}}c Rodr{\'{\i}}guez-Calero},
  title   = {Lor{\"{\i}}m {\'{\i}}psum and {\`{\i}}v{\^{\i}}n},
  journal = {Dol{\"{\i}}r s{\'{\i}}t},
  note    = {Am{\"{\i}}t cons{\'{\i}}ctur},
  date    = {2018},
}
@article{Zheng2016,
  author    = {Qinsi Zheng and Steffen Jockusch
               and Gabriel G. Rodr{\'{\i}}guez-Calero
               and Zhou Zhou and Hong Zhao and Roger B. Altman
               and H{\'e}ctor D. Abru{\~n}a and Scott C. Blanchard},
  title     = {Intra-molecular triplet energy transfer is a general
               approach to improve organic fluorophore photostability},
  journal   = {Photochemical {\&} Photobiological Sciences},
  year      = {2016},
  volume    = {15},
  number    = {2},
  pages     = {196--203},
  doi       = {10.1039/c5pp00400d},
}
\end{filecontents}

\begin{document}
\parencite{Zheng2016}
\cite{itest}

\printbibliography
\end{document}

Rodríguez-Calero, Loïc (2018). “Lorïm ípsum and ìvîn”. In: Dolïr sít. Amït consíctur.


Why is this Unicode business such an issue?

Unicode combines characters by adding the combining marks after the base glyph. LaTeX works exactly the other way round: The combining accents are added before the glyph (as a macro that gets the base glyph as argument).

Biber 'parses' the LaTeX character macros and converts them to Unicode characters for sorting and the like. That is done according to simple translations for macros into Unicode points and the complex Unicode rules.

Combining characters involving i are particularly complicated since LaTeX usually bases its characters upon the 'dotless i' (\i - ı, U+0131) to avoid clashes of accent and tittle, whereas Unicode seems to prefer its combining characters based on the 'small i' (i - i, U+0069) http://unicode.org/faq/char_combmark.html#22. That means that \'i gets converted to í (í, U+00ED), but \'\i to ı́ (ı́, U+0131 + U+0301, a combination of the dotless i and the accent).

LaTeX's inputenc can only deal with a sensible subset of Unicode and fails to account for ı́ (U+0131 + U+0301) while it handles í (U+00ED) just fine.

See also PLK's explanation in the linked answer as well as comments in https://github.com/plk/biber/issues/65 and https://github.com/plk/biblatex/issues/819.


Another solution that needs no such tricks, but might not be compatible with your workflow, is to use a proper Unicode engine like LuaLaTeX or XeLaTeX and font that has properly kerned accents (Linux Libertine, for example).

Related Question