Correct way to generate hyphenation patterns with BibLaTeX

biblatexhyphenation

What is the recommended way, to insert text into BibLaTeX databases, in order to enable LaTeX to hyphenate the entries accordingly to avoid overfull lines due to dashes in composed words?

I suppose, you all have faced the problem. You carefully edit the databases, that contain the bibliographical information given to BibLaTeX. Of course, every information is precious and the more the better, one would argue. So you also take care, that every detail is present in your databases and most of those information pieces will get extracted by biber and therefore will be presented in a LaTeX file (within the bibliography section).

Of course, LaTeX has to typeset the content of your databases, in order to make up the final page. Therefore it has to hyphenate the content.

I am a native german, most of my documents are in the german language. Therefore, my documents of course incorporate

\usepackage[T1]{fontenc}
\usepackage[ngerman]{babel}
\usepackage[german]{csquotes}
\usepackage[backend=biber,style=numeric]{biblatex}

to set up the german language and be able, to have proper hyphenation schemes.

Most of the text in my documents use the babel shorthand "= to insert a dash, at which LaTeX may hyphenate the word and will also check the following hyphenation possibilities in the remaining word. This looks something like

... das Corporate"=Design, welches als Kopier"=Vorlage ...

This works really fine. In fact, inserting the "= instead of the normal dash (-) is now a the normal way for me and has become a second nature to me. I use it everywhere.

But this does not work for the entries in my BibLaTeX databases. After the requested entries have been extracted and manipulated by BibLaTeX, they are once again compiled by LaTeX, but instead of replacing the shorthand with the dash, the shorthand letters are copied into the final document.

On the other hand: inserting the regular dash into the BibLaTeX databases results in the normal LaTeX behaviour, in that the dash prevents LaTeX to hyphenate everything in the word following the dash. This surely will produce overfull hboxes!

Both solutions are not satisfying at all! Either you have words containing "= in the output or overfull lines.

I thought, that bibers job is only, to extract the needed entries from the databases, manipulate them as dictated by the chosen style and output everything into a LaTeX like file, which is then interpreted by the latex (or pdflatex, lualatex, xelatex …) compiler again. And while this is true to some extent, is is obviously not true, when it comes to replacing the defined babel shorthands and hyphenation of words, that have been concatenated with such a shorthand. The trick, that worked so well for the rest of the document, suddenly stops working, when compiling the bibliography section.

So the question is: what is the correct way, to safe and sound input some sort of dash-like characters, shorthands, macros, …, which will be hyphenated and afterwords typeset by LaTeX, to produce the high quality output, we all know and love.

EDIT

As mentioned by David Carlisle, using -\hspace{0pt} does the trick. LaTeX is now able to hyphenate after an -. But I am sad to say, that this addition make the entries less unreadable.


@Book{voss21:wissenschaftliche-arbeit-mit-latex,
  author     = {Voss, Herbert},
  title      = {Die wissenschaftliche Arbeit mit \LaTeX},
  indextitle     = {Wissenschaftliche Arbeit mit LaTeX, die},
  year       = 2021,
  month      = 6,
  subtitle   = {unter Verwendung von \LuaTeX{}, KOMA-\hspace{0pt}Script und
                    Biber/\BibLaTeX},
  series     = {DANTE-\hspace{0pt}Edition},
  note       = {Ist für Juni 2021 angekündigt.},
  langid     = {ngerman},
  pubstate   = {Zweite, überarbeitete und erweiterte Auflage},
  location   = {Berlin},
  publisher  = lob,
  abstract   = {\TeX{} wurde vor mehr als 35 Jahren für das
                    Erstellen von Dokumenten im wissenschaftlichen
                    Bereich erstellt. Anfänglich nur für Manuskripte von
                    mathematisch orientierten Büchern geschaffen, wurde
                    das Satzsystem \TeX{} sehr schnell als
                    prädestiniertes System für den gesamten
                    wissenschaftlichen Bereich erkannt. Mit dem neuen
                    \TeX-\hspace{0pt}Compiler \LuaTeX{}, welcher auf dem
                    traditionellen\TeX{} aufbaut, dem \TeX-\hspace{0pt}Format
                    \LaTeX{} und den Dokumentenklassen von KOMA–\hspace{pt}Script
                    lassen sich wissenschaftliche Arbeiten für jeden
                    Bereich und in jeder Sprache erstellen. Die
                    wissenschaftliche Arbeit stellt nicht nur besondere
                    Anforderungen an die Art und Weise von
                    Literaturverweisen und der Ausgabe der Bibliografie,
                    sondern auch an typografische Gepflogenheiten. Mit
                    diesem Buch bekommt jeder viele Hinweise für das
                    Erstellen von wissenschaftlichen Arbeiten auf
                    höchstem Niveau.},
  keywords   = {book},
  pagetotal  = 448,
  edition    = 2,
  isbn       = {978-3-96543-217-8},
}

Ulrike Fischer mentioned to use \hyphen and \hyphenation offered by BibLaTeX. Inserting \hyphen instead of an - will produce an identical result.

Quoting from the BibLaTeX manual

langid field (identifier)
The language id of the bibliography entry. The alias hyphenation is provided for backwards compatibility. The identifier must be a language name known to the babel/polyglossia packages. This information may be used to switch hyphenation patterns and localise strings in the bibliography.

Localisations works well, but the hyphenation is my problem.

And please forgive me: I consider inserting hspace{0pt} to be an ugly workaround at best. This can't be the long term solution. Neither will it be, to switch from justified blocks to ragged lines, or being \sloppy when typesetting the bibliography.

In my eyes, it is inconsistent, to have such a good hyphenation algorithm, being blocked by whatever the reason this might be.

(This question might sound similar to an earlier question of mine. That is true. But as that particular question is still not marked as solved and only has one answer, which specifically requires a LuaLaTeX compiler, I'd be pleased, if someone can advise me in a more general way, how to solve my problem.)

Best Answer

As far as I can see, you problem is not that hyphenation and hyphenation pattern in general do not work properly. Your overall problem is that words containing hyphens cannot be hyphenated elsewhere (see e.g. Adequate hyphenation of words already containing a hyphen) and specifically your problem is that the babel-german solution "= does not quite work with biblatex (as also documented in your earlier question babelshorthand "= does not work with BibLaTeX?).

It might be instructive to briefly discuss why "= does not work. "= relies on " being a shorthand/an active character and thus ultimately on category codes. At least for me, catcodes are one of the more tricky aspects of TeX programming. Essentially (but quite possibly not entirely correctly) TeX remembers the category code of each character it encounters. That category code is essentially frozen and cannot be changed after it has been read. So the category code settings at the point when code is "read" is relevant for how it behaves later on even if the category code of the characters involved are later changed.

biblatex reads the contents of your bibliography items from the .bbl file at \begin{document}. This is done so that all data is available throughout the whole document. Crucially the file is read before babel selects the document language. In particular even if your document language is ngerman, where " is an active character/shorthand, the .bbl file is read at a time when " is still a normal character and not at all a shorthand.

In fact "= works fine if we already make " a shorthand before \begin{document}, because then " is an active character when the .bbl file is read (I cannot tell you if has any bad side-effects later on)

\documentclass[english,ngerman]{scrartcl}

\usepackage[style=numeric]{biblatex}
\usepackage{babel}
\usepackage{csquotes}
\usepackage{dtk-logos}

\begin{filecontents}{\jobname.bib}
@Manual{class:scrguide,
  title      = {KOMA-Script},
  author     = {Kohm, Markus},
  month      = May,
  year       = 2016,
  url        = {http://www.komascript.de/~mkohm/scrguide.pdf},
  langid     = {ngerman},
  note       = {Bestandteil der Online"=Dokumentation von
                  \TeXLive, Datei \url{scrguide.pdf}},
  keywords   = {manual},
}
\end{filecontents}
\addbibresource{\jobname.bib}

\shorthandon{"}
\begin{document}
Der Eintrag~\cite{class:scrguide} aus meiner
Literatur"=Datenbank erscheint im Quellen"=Verzeichnis leider mit
einem \verb|"=| in der Ausgabe.

\printbibliography
\end{document}

Markus Kohm. KOMA-Script.

As Ulrike Fischer pointed out in the comments, biblatex provides the command \hyphen as a replacement for "= that does not rely on (non-standard) category codes and thus works out of the box with all language settings.

\documentclass[english,ngerman]{scrartcl}

\usepackage[style=numeric]{biblatex}
\usepackage{babel}
\usepackage{csquotes}
\usepackage{dtk-logos}

\begin{filecontents}{\jobname.bib}
@Manual{class:scrguide,
  title      = {KOMA-Script},
  author     = {Kohm, Markus},
  month      = May,
  year       = 2016,
  url        = {http://www.komascript.de/~mkohm/scrguide.pdf},
  langid     = {ngerman},
  note       = {Bestandteil der Online\hyphen Dokumentation von
                  \TeXLive, Datei \url{scrguide.pdf}},
  keywords   = {manual},
}
@Book{voss21:wissenschaftliche-arbeit-mit-latex,
  author     = {Voss, Herbert},
  title      = {Die wissenschaftliche Arbeit mit \LaTeX},
  indextitle     = {Wissenschaftliche Arbeit mit LaTeX, die},
  year       = 2021,
  month      = 6,
  subtitle   = {unter Verwendung von \LuaTeX{}, KOMA\hyphen Script und
                    Biber/\BibLaTeX{} und anderem mehr mehr},
  series     = {DANTE\hyphen Edition},
  langid     = {ngerman},
  location   = {Berlin},
  keywords   = {book},
  pagetotal  = 448,
  edition    = 2,
  isbn       = {978-3-96543-217-8},
}
\end{filecontents}
\addbibresource{\jobname.bib}

\begin{document}
Der Eintrag~\cite{class:scrguide,voss21:wissenschaftliche-arbeit-mit-latex} aus meiner
Literatur"=Datenbank erscheint im Quellen"=Verzeichnis leider mit
einem \verb|"=| in der Ausgabe.

\printbibliography
\end{document}

two bibliography entries, one hyphenates "DANTE-Edi-//newline tion".

Note how "DANTE-Edition" is hyphenated correctly according to German rules even though it contains a hyphen.

If you prefer typing "= over \hyphen , you may be able to post-process your file to replace all "=s by \hyphen s.

biblatex's \hyphen is defined as \nobreak-\nobreak\hskip\z@skip and so pretty much ends up doing what David Carlisle suggested in the comments.

Corporate\hyphen Design

is essentially

Corporate-\hspace{0pt}Design
Related Question