[Tex/LaTex] [Babel/Polyglossia]: wrong hyphenation

babelhyphenationpolyglossia

Consider the following MWE; say you save it in foo.tex.

\documentclass{article}

\setcounter{secnumdepth}{0}

\usepackage{polyglossia}
\setmainlanguage{english}
\setkeys{english}{variant=british}

% \usepackage[UKenglish]{babel}

\begin{document}

\section{Report of the PRESIDENTIAL COMMISSION on the Space Shuttle
Challenger
Accident}

\subsection{Volume 2: Appendix F - Personal Observations on Reliability
of
Shuttle}

by R. P. Feynman

For a successful technology, reality must take precedence over public
relations, for nature cannot be fooled.
\end{document}

Compile it with $ xelatex foo.tex. In the resulting pdf the word "Reliability" is incorrectly split as Reliab-ility. According to the Oxford Advanced Dictionary (dead-tree version), the correct hyphenation is re-li-abil-ity. Commenting the line \usepackage{polyglossia}, and the two lines after that, and uncommenting the babel line gives the same result.

Now comment the the \setkeys line (i.e. use "normal" english). The word is now split correctly as Relia-bility. Using babel with non-UK english also results in correct spliting.

Is this a typo in both polyglossia and babel, or in some file relied on by both packages (perhaps the file for UK-lang specific settings)? Help in debuging this is appreciated.

Best Answer

while i agree that the hyphenation is unfortunate and wrong, i happen to have in my possession a printed copy of the dictionary from which the british hyphenation patterns were generated. (The Oxford Minidictionary of Spelling and Word Division, the clarendon press, 1986) it contains the word, divided as reported.

here is the relevant page.

scan of dictionary page

two levels are indicated for the desirability of hyphenation:

  • primary/preferred -- solid bars
  • secondary -- broken bars

i'm thinking that perhaps some "known errors" were inserted by the publisher to trap an unscrupulous individual who would plagiarize the content, and publish an identical word list, and some tex users have just happened to hit some of the "bad examples". (inclusion of dummy entries is common practice in commercial mailing lists, for example.) i'm going to try to investigate this possibility.

Related Question