[Tex/LaTex] Sorting index entries with accented words

indexingsorting

Friends, I'm struggling for some time with this disturbance in the TeX force:

I have a set of words in Portuguese which are accented. When sorting these words, we treat accented letters the same way as their accentless counterparts. So, a list with these words:

abacate
ábaco
alavanca
árvore
arte
ácaro
aba

is sorted as

aba
abacate
ábaco
ácaro
alavanca
arte
árvore

When using these words as index entries, the accented letters are sorted via makeindex as symbols:

\begin{filecontents*}{mystyle.ist}
headings_flag 1
heading_prefix "\\textit{"
heading_suffix "}\\nopagebreak\n"
delim_0 " \\dotfill "
delim_1 " \\dotfill "
delim_2 " \\dotfill "
\end{filecontents*}

\documentclass{memoir}

\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}

\usepackage{imakeidx}[2012/05/09]

\def\pfill{\unskip~\dotfill\penalty500 
  \strut\nobreak\dotfil~\ignorespaces}

\def\efill{\hfill\nopagebreak}

\def\dotfil{\leaders\hbox to.6em{\hss.\hss}\hfil}

\makeindex[name=words,columns=1,options=-s mystyle]
\begin{document}

Hello world.

\index[words]{abacate}
\index[words]{ábaco}
\index[words]{alavanca}
\index[words]{árvore}
\index[words]{arte}
\index[words]{ácaro}
\index[words]{aba}

\printindex[words]
\end{document}

Output

Marco Daniel brilliantly suggested me in the chat to try xindy instead of makeindex. So

xindy -M texindy -C utf8 -L portuguese filename.idx

produces the "correct" sorting (in Portuguese of course, I'm aware that other languages have different rules).

Unfortunately, I use some custom styles for my indices (we can see one of them above). For my despair, xindy works quite differently than makeindex, and that .ist style of mine is not supported AFAIK.

It would be fine for me to move to xindy if I could also port my .ist styles as well.

The workaround I'm using right now is to provide an accentless word before the "correct" one:

\index[words]{abacate}
\index[words]{abaco@ábaco}
\index[words]{alavanca}
\index[words]{arvore@árvore}
\index[words]{arte}
\index[words]{acaro@ácaro}
\index[words]{aba}

Output 2

This one works. :)

Is it possible to provide a sorting rule to makeindex, or maybe providing similar .ist styles to xindy? I'm fine with the current workaround, but it's quite painful to remove every single accent from my index entries. I'd prefer to stick with makeindex, if possible. :)

Best Answer

Here's the simple solution. Well, not so simple, after all. :)

One small problem, that I'll solve in a next release of imakeidx: for some reason (that I don't remember now) we decided that program=xindy called texindy anyway. But unfortunately, it seems that the calls

xindy -M mystyle -C utf8 -L portuguese words.idx

and

texindy -M mystyle -C utf8 -L portuguese words.idx

are not equivalent, as the latter throws up an incomprehensible error (probably a bug in the texindy script).

Thus the following document will require to run manually xindy (but you have Arara, so it's not a problem), until the small problems are corrected.

Notice that xindy provides two commands for the letter groups, which should be redefined in the preamble to do what's wanted.

\begin{filecontents*}{mystyle.xdy}
(markup-locclass-list :open "\dotfill " :sep "\dotfill ")
\end{filecontents*}

\documentclass{memoir}

\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}

\usepackage{imakeidx}[2012/05/09]
\newcommand*{\lettergroupDefault}[1]{}
\newcommand*\lettergroup[1]{%
  \par\textit{#1}\par
  \nopagebreak
  }

\def\pfill{\unskip~\dotfill\penalty500 
  \strut\nobreak\dotfil~\ignorespaces}

\def\efill{\hfill\nopagebreak}

\def\dotfil{\leaders\hbox to.6em{\hss.\hss}\hfil}

\makeindex[name=words,columns=1,program=xindy,options=-M texindy -M mystyle -C utf8 -L portuguese]
\begin{document}

Hello world.

\index[words]{abacate}
\index[words]{ábaco}
\index[words]{alavanca}
\index[words]{árvore}
\index[words]{arte}
\index[words]{ácaro}
\index[words]{aba}

\printindex[words]
\end{document}
Related Question