[Tex/LaTex] Differences between “xindy” and “makeindex”

indexingxindy

I'd like to know which are the essential differences between makeindex and xindy (or texindy) in this sense:

  • What things can we do with xindy that we can not do with makeindex? or vice versa.
  • What is the primary use of each one of them?
  • Syntactic differences between their style files. (.ist vs .xdy)

Best Answer

Short Answer

xindy is far more flexible than makeindex. Unlike makeindex, xindy supports UTF8, can sort according to different language rules and can support enumeration systems outside of the European and Roman numbering systems. The UTF8 support works best with xindy -I xindy rather than with texindy (xindy -I latex).

Long Answer

What things can we do with xindy that we can not do with makeindex?

The xindy FAQ provides a useful summary of the things xindy can do that can't be done with makeindex. Here's an abridged form of that summary:

Internationalization

xindy can be configured to process indexes for many languages with different letter sets and different sorting rules. [makeindex is hard-coded for the English alphabet.]

Location classes

makeindex is able to recognize and process arabic numbers, roman numerals and letter-based alphabets as specifiers for the indexed location. Simple composite structures of these are also possible to process. xindy provides a powerful declaration scheme called location-classes. [This means you can have locations in a completely different numbering scheme (for example, hieroglyph numerals).]

The concept of attributes

With makeindex one can assign a markup to each index entry using the encapsulators (usually following the vertical bar sign in an index entry command). For example in the specification

\index{xindy|bold}

the encapsulator is bold which encapsulates the page-numbers in the markup-phase. An additional TeX-macro must be supplied to assign some markup with the page number. This concept has completely been dropped in xindy in favour of a more powerful scheme called attributes. Attributes can be used to (i) define several grouping and ordering rules with locations and we can define (ii) markup-tags for the document preparation system.

Cross references

Cross-references were implemented in makeindex with the encapsulation mechanism, which only served for markup purposes. This has been completely separated in xindy.

In addition to the above, another thing that you can do with xindy but not with makeindex is to have arbitrary sub-levels. With makeindex you're restricted to primary (level 0), first sub-level and second sub-level entries.

or vice versa.

The only advantages of makeindex over xindy that I can think of are:

  • makeindex should be installed with all TeX installations. A TeX installation that doesn't have makeindex is most likely extremely old. However, xindy is only included in TeX Live so MikTeX users will need to install it separately.
  • xindy is a Perl script and so you must have the Perl interpreter installed on your computer. This is a stumbling block for some Windows users. Unix-like systems tend to have Perl preinstalled.
  • makeindex usually works with restricted \write18 but last time I tried calling xindy with a restricted \write18 it was disabled. I expect it will eventually be added to the list of allowed applications. (I can't see any reason why it shouldn't be allowed.)

What is the primary use of each one of them?

Sorting and collating. Both read a file that contains a set of terms with an associated location (or cross-reference). The terms are sorted according to the designated alphabet (the English alphabet for makeindex, or the chosen alphabet for xindy). The term may have a corresponding key that should be used for the actual sort comparison. Multiple occurrences of each term are then merged into a single entry with a sorted location list. Consecutive numbering in the location list may be compacted into a number range. This information is then written to another file, which can be input by an application such as tex or latex. (The output markup can be changed via a style file or module, which means that although makeindex and xindy are often used with TeX/LaTeX, they can be used with other systems as well.)

Syntactic differences between their style files. (.ist vs .xdy)

The syntax of the .ist format is much simpler than the .xdy format, but this is because it's more restrictive.

makeindex style format (.ist)

This is just a list of ⟨specifier⟩ ⟨attribute⟩ pairs. The specifiers are divided into two groups: the input specifiers and the output specifiers.

The input specifiers tell makeindex how the input is formatted. Consider the following document:

\documentclass{article}

\usepackage{makeidx}

\makeindex

\begin{document}

Duck\index{duck|textbf}.
Zebra\index{zebra}\index{stripy|see{zebra}}

\newpage

Aardvark\index{aardvark}
Zebra\index{zebra}
\emph{The Rise and Fall of the Duck Empire}%
\index{Rise and Fall of Duck Empire@\emph{The Rise and Fall of the Duck Empire}}%

\newpage

Zebra\index{zebra}
Duck\index{duck}
Aardvark\index{aardvark}
Mallard\index{duck!mallard}

\printindex

\end{document}

On the first LaTeX run, a file with the extension .idx is created and at the end of the run contains:

\indexentry{duck|textbf}{1}
\indexentry{zebra}{1}
\indexentry{stripy|see{zebra}}{1}
\indexentry{aardvark}{2}
\indexentry{zebra}{2}
\indexentry{Rise and Fall of Duck Empire@\emph{The Rise and Fall of the Duck Empire}}{2}
\indexentry{zebra}{3}
\indexentry{duck}{3}
\indexentry{aardvark}{3}
\indexentry{duck!mallard}{3}

This is the default format for makeindex but can be explicitly set in a .ist file using:

actual '@'
arg_close '}'
arg_open '{'
encap '|'
keyword '\\indexentry'
level '!'

There are some other input specifiers as well. (See Index Preparation and Processing.) These specifiers enable makeindex to correctly parse the input file.

The output specifiers tell makeindex how to format the output. If you run makeindex on the above example, the resulting .ind file will look like:

\begin{theindex}

  \item aardvark, 2, 3

  \indexspace

  \item duck, \textbf{1}, 3
    \subitem mallard, 3

  \indexspace

  \item \emph{The Rise and Fall of the Duck Empire}, 2

  \indexspace

  \item stripy, \see{zebra}{1}

  \indexspace

  \item zebra, 1--3

\end{theindex}

This uses the default output specifiers, which include:

preamble "\\begin{theindex}\n"
postamble "\n\n\\end{theindex}\n"
group_skip "\n\n  \\indexspace\n"
item_0 "\n  \\item "
item_1 "\n     \\subitem "
delim_0 ", "
delim_1 ", "

This .ind file can now be input by LaTeX (via \printindex). The resulting index looks like:

Image of index

If I wanted, say, to have headings at the start of each letter group, I can create a file called, say, test.ist that contains:

headings_flag 1
heading_prefix "  \\item\\textbf{"
heading_suffix "}\n  \\indexspace\n"

Now I need to run makeindex with -s test.ist which will now write the following to the .ind file:

\begin{theindex}
  \item\textbf{A}
  \indexspace

  \item aardvark, 2, 3

  \indexspace
  \item\textbf{D}
  \indexspace

  \item duck, \textbf{1}, 3
    \subitem mallard, 3

  \indexspace
  \item\textbf{R}
  \indexspace

  \item \emph{The Rise and Fall of the Duck Empire}, 2

  \indexspace
  \item\textbf{S}
  \indexspace

  \item stripy, \see{zebra}{1}

  \indexspace
  \item\textbf{Z}
  \indexspace

  \item zebra, 1--3

\end{theindex}

The next LaTeX run now produces the index:

Image of index

The reason why the entry "The Rise and Fall of the Duck Empire" is listed in the "R" category rather than the "T" category is because I set the sort key for that entry to "Rise and Fall of Duck Empire".

xindy style format (.xdy)

Unlike makeindex, xindy has modules, which can load other modules, so you can build on existing styles. In addition, xindy has an --input-markup (-I) command line switch that is used to indicate the input markup. There are three supported markup settings: latex, omega and xindy.

xindy -I latex

Using texindy is equivalent to calling xindy with -I latex and with the modules that enable xindy to parse files written using the default makeindex input specifiers. So, for example, the above .idx file created by LaTeX can be processed directly by texindy. If the file is called, say, test.idx then texindy test.idx will create an .ind file that contains:

\begin{theindex}
  \providecommand*\lettergroupDefault[1]{}
  \providecommand*\lettergroup[1]{%
      \par\textbf{#1}\par
      \nopagebreak
  }

  \lettergroup{A}
  \item aardvark, 2, 3

  \indexspace

  \lettergroup{D}
  \item duck, \textbf{1}, 3
    \subitem mallard, 3

  \indexspace

  \lettergroup{R}
  \item \emph{The Rise and Fall of the Duck Empire}, 2

  \indexspace

  \lettergroup{S}
  \item stripy, \see{zebra}{}

  \indexspace

  \lettergroup{Z}
  \item zebra, 1--3

\end{theindex}

This is similar to the .ind file created by makeindex except that it uses \lettergroup to markup the category headings. If this command isn't already defined, it will be defined via \providecommand at the start of the .ind file. If you want to change the way the heading is formatted, you just need to define \lettergroup before \printindex. This makes it simpler than the makeindex example shown above that needed a custom .ist file to make the headings appear.

Writing a xindy module is quite complicated and too long to discuss in this (already very long) answer, but the xindy FAQ gives an introduction. However, there are a number of modules supplied by xindy that cover common requirements, in particular the language modules. The modules are in subdirectories of TEXMF/xindy/modules/ where TEXMF is the base of the TEXMF tree. The language modules are in TEXMF/xindy/modules/lang/ and are identified via the -L command line option.

Suppose my .idx file now looks like:

\indexentry{ænder|textbf}{1}
\indexentry{zebra}{1}
\indexentry{aardvark}{2}
\indexentry{zebra}{2}
\indexentry{zebra}{3}
\indexentry{ænder}{3}
\indexentry{aardvark}{3}
\indexentry{ænder!gråand}{3}

makeindex creates the following .ind file:

\begin{theindex}

  \item aardvark, 2, 3

  \indexspace

  \item zebra, 1--3

  \indexspace

  \item ænder, \textbf{1}, 3
    \subitem gråand, 3

\end{theindex}

Here makeindex has positioned "ænder" after "zebra". This may not look too bad at first glance if that's the correct position for your language, but now try adding headings by creating an .ist file that contains:

headings_flag 1
heading_prefix "  \\item\\textbf{"
heading_suffix "}\n  \\indexspace\n"

Running makeindex with this style results in:

\begin{theindex}
  \item\textbf{A}
  \indexspace

  \item aardvark, 2, 3

  \indexspace
  \item\textbf{Z}
  \indexspace

  \item zebra, 1--3

  \indexspace
  \item\textbf{Ã}
  \indexspace

  \item Ênder, \textbf{1}, 3
    \subitem gråand, 3

\end{theindex}

The UTF8 characters have become mangled as makeindex has only grabbed the first octet of æ for the heading. This has ruined the file encoding.

In theory, if I want to use texindy instead, I need to specify the language using the -L switch (in this case -L danish) and the encoding using the -C switch (in this case -C utf8). Unfortunately this results in the error:

(require "tex/inputenc/utf8.xdy")
ERROR: Could not find file "tex/inputenc/utf8.xdy" !

and no .ind file is produced.

The error goes away if I use -M lang/danish/utf8. This results in the .ind file containing:

\begin{theindex}
  \providecommand*\lettergroupDefault[1]{}
  \providecommand*\lettergroup[1]{%
      \par\textbf{#1}\par
      \nopagebreak
  }

  \lettergroup{A}
  \item aardvark, 2, 3
  \item ænder, \textbf{1}, 3
    \subitem gråand, 3

  \indexspace

  \lettergroup{Z}
  \item zebra, 1--3

\end{theindex}

which has put "ænder" in the "A" letter group (which is incorrect for the Danish alphabet, see the comment below).

Getting the .idx file into the format shown above is somewhat harder. The following XeLaTeX document works fine:

\documentclass{article}

\usepackage{fontspec}

\usepackage{makeidx}

\makeindex

\begin{document}

Ænder\index{ænder|textbf}
Zebra\index{zebra}

\newpage

Aardvark\index{aardvark}
Zebra\index{zebra}

\newpage

Zebra\index{zebra}
Ænder\index{ænder}
Aardvark\index{aardvark}
Gråand\index{ænder!gråand}

\printindex

\end{document}

The equivalent LaTeX document:

\documentclass{article}

\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage[danish]{babel}

\usepackage{makeidx}

\makeindex

\begin{document}

Ænder\index{ænder|textbf}
Zebra\index{zebra}

\newpage

Aardvark\index{aardvark}
Zebra\index{zebra}

\newpage

Zebra\index{zebra}
Ænder\index{ænder}
Aardvark\index{aardvark}
Gråand\index{ænder!gråand}

\printindex

\end{document}

produces:

\indexentry{\IeC {\ae }nder|textbf}{1}
\indexentry{zebra}{1}
\indexentry{aardvark}{2}
\indexentry{zebra}{2}
\indexentry{zebra}{3}
\indexentry{\IeC {\ae }nder}{3}
\indexentry{aardvark}{3}
\indexentry{\IeC {\ae }nder!gr\IeC {\r a}and}{3}

which confuses texindy.

xindy -I xindy

With the xindy input markup, the .idx file has entries in the format:

(indexentry :tkey (("sort" "term") ) :locref "location" :attr "attribute" )

where sort is the text used by the comparison function when sorting, term is how the entry should be typeset in the .ind file, location is the associated location (page number) for this entry and attribute is the associated attribute. This is the format used by the glossaries package when used with the xindy package option.

The differences between the syntax used by makeindex and xindy can be illustrated by examining the files created using the glossaries package.

Consider the following LaTeX document:

\documentclass{article}

\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage[danish]{babel}

\usepackage[index,style=indexgroup]{glossaries}

\makeglossaries

\newterm[name={æ}nder]{aender}
\newterm{zebra}
\newterm{aardvark}
\newterm[parent=aender,name=gråand]{graand}

\begin{document}

\Gls[format=textbf]{aender}
\Gls{zebra}

\newpage

\Gls{aardvark}
\Gls{zebra}

\newpage

\Gls{zebra}
\Gls{aender}
\Gls{aardvark}
\Gls{graand}

\printindex

\end{document}

This is analogous to the earlier makeidx example. By default, this document assumes that makeindex will be used. This creates the .idx file containing:

\glossaryentry{{æ}nder?\glossentry{aender}|setentrycounter[]{page}\textbf}{1}
\glossaryentry{zebra?\glossentry{zebra}|setentrycounter[]{page}\glsnumberformat}{1}
\glossaryentry{aardvark?\glossentry{aardvark}|setentrycounter[]{page}\glsnumberformat}{2}
\glossaryentry{zebra?\glossentry{zebra}|setentrycounter[]{page}\glsnumberformat}{2}
\glossaryentry{zebra?\glossentry{zebra}|setentrycounter[]{page}\glsnumberformat}{3}
\glossaryentry{{æ}nder?\glossentry{aender}|setentrycounter[]{page}\glsnumberformat}{3}
\glossaryentry{aardvark?\glossentry{aardvark}|setentrycounter[]{page}\glsnumberformat}{3}
\glossaryentry{{æ}nder?\glossentry{aender}!gråand?\subglossentry{1}{graand}|setentrycounter[]{page}\glsnumberformat}{3}

This uses \glossaryentry instead of \indexentry and the encap character is ? instead of @, so glossaries creates a makeindex .ist file that contains:

actual '?'
encap '|'
level '!'
quote '"'
keyword "\\glossaryentry"
preamble "\\glossarysection[\\glossarytoctitle]{\\glossarytitle}\\glossarypreamble\n\\begin{theglossary}\\glossaryheader\n"
postamble "\%\n\\end{theglossary}\\glossarypostamble\n"
group_skip "\\glsgroupskip\n"
item_0 "\%\n"
item_1 "\%\n"
item_2 "\%\n"
item_01 "\%\n"
item_x1 "\\relax \\glsresetentrylist\n"
item_12 "\%\n"
item_x2 "\\relax \\glsresetentrylist\n"
delim_0 "\{\\glossaryentrynumbers\{\\relax "
delim_1 "\{\\glossaryentrynumbers\{\\relax "
delim_2 "\{\\glossaryentrynumbers\{\\relax "
delim_t "\}\}"
delim_n "\\delimN "
delim_r "\\delimR "
headings_flag 1
heading_prefix "\\glsgroupheading\{"
heading_suffix "\}\\relax \\glsresetentrylist "
symhead_positive "glssymbols"
numhead_positive "glsnumbers"
page_compositor "."
suffix_2p ""
suffix_3p ""

If, on the other hand, you added the xindy package option when you load glossaries:

\usepackage[index,xindy]{glossaries}

The .idx file now looks like:

(indexentry :tkey (("{æ}nder" "\\glossentry{aender}") ) :locref "{}{1}" :attr "pagetextbf" ) 
(indexentry :tkey (("zebra" "\\glossentry{zebra}") ) :locref "{}{1}" :attr "pageglsnumberformat" ) 
(indexentry :tkey (("aardvark" "\\glossentry{aardvark}") ) :locref "{}{2}" :attr "pageglsnumberformat" ) 
(indexentry :tkey (("zebra" "\\glossentry{zebra}") ) :locref "{}{2}" :attr "pageglsnumberformat" ) 
(indexentry :tkey (("zebra" "\\glossentry{zebra}") ) :locref "{}{3}" :attr "pageglsnumberformat" ) 
(indexentry :tkey (("{æ}nder" "\\glossentry{aender}") ) :locref "{}{3}" :attr "pageglsnumberformat" ) 
(indexentry :tkey (("aardvark" "\\glossentry{aardvark}") ) :locref "{}{3}" :attr "pageglsnumberformat" ) 
(indexentry :tkey (("{æ}nder" "\\glossentry{aender}") ("gråand" "\\subglossentry{1}{graand}") ) :locref "{}{3}" :attr "pageglsnumberformat" ) 

The extended Latin characters, such as å haven't been expanded as they were in the earlier makeidx example since, by default, glossaries "sanitizes" the sort key. (The reason for the braces around æ is discussed in the UTF8 section of the mfirstuc manual.)

This time, instead of creating an accompanying .ist file, glossaries now creates a xindy .xdy file that is considerable larger, and is too large to reproduce here without exceeding the maximum length of a StackExchange answer. However, if you try the above example yourself, you'll be able to see the syntactic differences.

Running the Perl script makeglossaries test (where the example file is called test.tex) is equivalent to running xindy as:

xindy  -L danish -C utf8 -I xindy -M "test" -t "test.ilg" -o "test.ind" "test.idx"

Although -L danish -C utf8 has been used, this now doesn't produce the earlier error with texindy as xindy is no longer trying to input tex/inputenc/utf8.xdy. The index now looks like:

zebra ænder gråand aardvark

Again, switching to XeLaTeX makes the document simpler:

\documentclass{article}

\usepackage{fontspec}
\usepackage{polyglossia}
\setmainlanguage{danish}

\usepackage[index,style=indexgroup,xindy]{glossaries}

\makeglossaries

\newterm{ænder}
\newterm{zebra}
\newterm{aardvark}
\newterm[parent=ænder]{gråand}

\begin{document}

\Gls[format=textbf]{ænder}
\Gls{zebra}

\newpage

\Gls{aardvark}
\Gls{zebra}

\newpage

\Gls{zebra}
\Gls{ænder}
\Gls{aardvark}
\Gls{gråand}

\printindex

\end{document}
Related Question