Swedish characters (åäö) in glossaries entries breaks because the file is not saved as utf-8

glossariesunicode

This error is driving me insane because I cannot find a fix for it. The package glossaries works fine, except when the entries have Swedish characters (åäö) in them. I have found the root cause, and that's because makeglossaries outputs the files in ISO-8859-1 instead of UTF-8, and when LaTeX tries to read the glossary files, it assumes them to be UTF-8 and that's where things break.

Minimum example, main.tex:

\documentclass[12pt,a4paper,twoside]{report}

\usepackage[utf8]{inputenc}
\usepackage[swedish]{babel}
\usepackage[T1]{fontenc}


\usepackage[nonumberlist]{glossaries}
\makeglossaries
\loadglsentries{ordlista}

\begin{document}

\glsaddall
\printglossaries

\end{document}

ordlista.tex:

\newglossaryentry{This works}
{
    name=This works,
    description={This works fine}
}

\newglossaryentry{Swedish åäö}
{
    name=Swedish åäö,
    description={Swedish characters in the entry breaks things}
}

I compile the document with pdflatex main. Then I run makeglossaries main with the following output:

$ makeglossaries main
makeglossaries version 2.20 (2016-12-16)
added glossary type 'main' (glg,gls,glo)
makeindex  -s "main.ist" -t "main.glg" -o "main.gls" "main.glo"
This is makeindex, version 2.15 [TeX Live 2016] (kpathsea + Thai support).
Scanning style file ./main.ist.............................done (29 attributes redefined, 0 ignored).
Scanning input file main.glo....done (2 entries accepted, 0 rejected).
Sorting entries....done (2 comparisons).
Generating output file main.gls....done (9 lines written, 0 warnings).
Output written in main.gls.
Transcript written in main.glg.

Then, when running pdflatex main again, a lot of errors are spewed out:

l.4 \glossentry{Swedish }
                            {\glossaryentrynumbers{\relax
! Missing ) inserted for expression.
<to be read again> 


(snip)

l.5 ...etentrycounter[]{page}\glsnumberformat{1}}}
                                                  \glsgroupskip

Package glossaries Warning: Glossary entry `Swedish \GenericError {(inputenc)  
              }{Package inputenc Error: Unicode char  (U+5)
(glossaries)                not set up for use with LaTeX}{See the inputenc pac
kage documentation for explanation.}{Your command was ignored.
(glossaries)                Type  I <command> <return>  to replace it with anot
her command,
(glossaries)                or  <return>  to continue without it.}' has not bee
n defined on input line 5.

So, the glossary files:

$ file main.g*
main.glg: Makeindex log file, ASCII text
main.glo: LaTeX raw glossary, ISO-8859 text
main.gls: LaTeX document, ISO-8859 text

This is how main.glo looks like:

\glossaryentry{This works?\glossentry{This works}|setentrycounter[]{page}\glsnumberformat}{1}
\glossaryentry{Swedish åäö?\glossentry{Swedish åäö}|setentrycounter[]{page}\glsnumberformat}{1}

And that's the root cause. Now, if I manually edit main.glo to remove the junk and enter the correct characters and save it as UTF-8, then the problem goes away. My guess here is that makeglossaries is the root cause as it saves the *.gl* files as ISO-8859 instead of UTF-8.

Question: How do I force makeglossaries to always output the textfiles in UTF-8?

Or: How do I tell the glossaries package that the *.glo file is encoded in ISO-8859 instead of UTF-8 (which it assumes)?

Best Answer

You have a quite old texlive (2016) with a makeglossaries version 2.20.

Support for utf8 has be improved a lot in the past years and in a current texlive 2021/2022 (pretest) it works fine.

Consider to upgrade.

Related Question