[Tex/LaTex] How to sort an alphanumeric list

programmingsorting

I have been trying to find a package or a method to sort lists of names in an easy manner in TeX/LaTeX. I have tried some of the routines in the xfor package, as well as looked at some of the new macros in LaTeX3 but was not very successful with either of them.

I have come with the idea of using the MakeIndex program to sort and categorize. It is available with every distribution and the LaTeX class is very short and easily hackable being approximately 36 lines long (including the glossary macros that I can eliminate) and a few more definitions for styling in the basic book or article class.

Here is a minimal example that sorts a list of names by profession (category).

\documentclass[11pt]{article}
\usepackage{makeidx}
\def\addName#1#2{\index{Name!#1}\index{#2!#1}}
\makeindex
\begin{document}
\renewcommand{\indexname}{Famous and Infamous Sorted People}
\addName{Leslie Lamport}{Computer Scientist} 
\addName{Donald Knuth}{Computer Scientist}
\addName{Tim Berners Lee}{Computer Scientist}
\addName{Brian Kernighan}{Computer Scientist}
\addName{Noam Chomsky}{Linguist}
\addName{Yiannis Lazarides}{Lifelong Trainee \protect\TeX nician}
\addName{Leonard Euler}{Mathematician}
\addName{Carl Friendrich Gauss}{Mathematician}
\addName{August Ferdinard M\"{o}bius}{Mathematician}

%% Importing the .ind file rather than use \printindex
%% so we do not need to redefine the command
%% 
\input{indextest.ind}

\begin{verbatim}
\begin{theindex}
\item Lifelong Trainee \TeX nician
    \subitem Yiannis Lazarides, 1
  \item Linguist
    \subitem Noam Chomsky, 1 
\end{theindex}
\end{verbatim}
\end{document}

Is this a good idea? Are there any packages devoted to alphanumeric sorting? I know it can easily be done with an external script, but I am looking for a TeX/LaTeX solution. Do you think it is a good approach?

Best Answer

I didn't look at Charles's code because I wanted to see how difficult it would be to implement a generic sorting macro that was expandable. It turns out that it's more or less straight-forward to do in a continuation-passing style.

\documentclass{article}
\usepackage{etoolbox}
\makeatletter
% #1 - comparator
% #2 - token list to sort
\newcommand\sort[2]{%
        \ifstrempty{#2}
        {}% else
        {%
                \sort@begin#1{}#2\sort@s\sort@begin
        }%
}

% helpers
\def\sort@s{\sort@s}
\def\ifsort@s#1{%
        \ifx\sort@s#1%
                \expandafter\@firstoftwo
        \else
                \expandafter\@secondoftwo
        \fi
}

% #1 - comparator
% #2 - tokens processed so far
% #3 - smallest token so far
% #4 - rest of the list
\def\sort@begin#1#2#3#4\sort@begin{%
        \ifsort@s{#4}
        {%
                \sortend{#3}%
                \sort#1{#2}%
        }% else
        {%
                \sort@go#1{#2}{#3}#4\sort@go
        }%
}

% #1 - comparator
% #2 - tokens processed so far
% #3 - smallest token so far
% #4 - token under consideration
% #5 - rest of the list
\def\sort@go#1#2#3#4#5\sort@go{%
        #1{#3}{#4}{\sort@output#1{#2}{#5}}%
}
% #1 - comparator
% #2 - tokens processed so far
% #3 - rest of the list
% #4 - smaller of the two tokens
% #5 - larger of the two tokens
\def\sort@output#1#2#3#4#5{%
        \ifsort@s{#3}
        {%
                \sortoutput{#4}%
                \sort#1{#2{#5}}%
        }% else
        {%
                \sort@begin#1{#2{#5}}{#4}#3\sort@begin
        }%
}

\def\sort@numlt#1#2#3{%
        \ifnumcomp{#1}<{#2}
        {#3{#1}{#2}}% else
        {#3{#2}{#1}}%
}

\def\sort@numgt#1#2#3{%
        \ifnumcomp{#1}>{#2}
        {#3{#1}{#2}}% else
        {#3{#2}{#1}}%
}

\def\sort@alpha#1#2#3{%
        \ifnumcomp{\pdfstrcmp{#1}{#2}}<{0}
        {#3{#1}{#2}}% else
        {#3{#2}{#1}}%
}

\newcommand*\sortnumlt{\sort\sort@numlt}
\newcommand*\sortnumgt{\sort\sort@numgt}
\newcommand*\sortalpha{\sort\sort@alpha}
\makeatother

% Change these to change out the sort outputs.
\newcommand*\sortoutput[1]{#1, }
\newcommand*\sortend[1]{#1.}
\begin{document}
\sortnumgt{87632147{55}9{8/2}}

\sortalpha{{Goodbye}{Cruel}{World}}

\renewcommand*\sortoutput[1]{#1}
\renewcommand*\sortend[1]{#1}
\edef\temp{\sortnumlt{87632147{55}9}}
\texttt{\meaning\temp}
\end{document}

I hope the code is readable. I tried to document the arguments to each function. In particular, the first argument to \sort is a comparator control sequence that must take three arguments. The first two are the two elements of the following list to compare and the third is the continuation. Basically, the comparator should expand to either #3{#1}{#2} or #3{#2}{#1} depending on #1 being "less than" #2.

I've implemented three such comparators. The first two compare lists of numbers whereas the third does alphanumeric string comparison using \pdfstrcmp. Since the number comparisons use \ifnumcomp from etoolbox, you can use arbitrary arithmetic expressions for the elements, hence {8/2} in the list.

Finally, to show that this is expandable (at least it is whenever the comparator, \sortoutput, and \sortend are), \temp is defined using \edef and its meaning is typeset to ensure that it has been set to the appropriate value: macro:->12346778955.

Note that it would also be straight-forward to thread \sortoutput and \sortend through all of these macros so that multiple \sorts could be used in the same expansion. I just didn't think about adding those until I had written all of the rest of the code (more or less).

Note further that this is selection sort and thus would take Θ(n2) time, even in the best case. However, this being TeX and it having to construct the token lists for each argument each time, I think this implementation is actually Θ(n3) time. So don't try it on large lists.

Related Question