I'm trying to implement a (quick) sort algorithm.
Here is what I can do :
\documentclass[preview, border=7mm]{standalone}
\usepackage{pgffor}
% \push{xs}{\i} add \i in the array \xs
\def\push#1#2{\expandafter\xdef\csname #1\endcsname{\expandafter\ifx\csname #1\endcsname\empty\else\csname #1\endcsname,\fi#2}}
\newcommand*{\qsort}[1]{%
\edef\arr{#1}% the array to sort
\foreach[count=\len] \i in \arr{}% take the length
\ifnum\len>1\relax% if 0 or 1 elements in the array, nothing to sort
\pgfmathsetmacro\midval{{\arr}[int(\len/2)]}% take the middle element (a random one is probably better) as pivot
\let\xs\empty\let\xe\empty\let\xl\empty% \xs (s for small), \xe (e for equal), \xl (l for large)
\foreach \i in \arr{%
\ifdim \i pt < \midval pt%
\push{xs}{\i}% if \i < pivot, push it in \xs
\else%
\ifdim \i pt = \midval pt%
\push{xe}{\i}% if \i = pivot, push it in \xe
\else%
\push{xl}{\i}% if \i > pivot, push it in \xl
\fi%
\fi%
}%
\edef\nxs{\xs}\edef\nxe{\xe}\edef\nxl{\xl}% save the globals
\ifx\nxs\empty\else{\qsort\nxs},\fi% cal qsort for \nxs if not empty
\nxe% copy \nxe (in general containing only the pivot element)
\ifx\nxl\empty\else,{\qsort\nxl}\fi% cal qsort for \nxl if not empty
\else%
\arr% 0 or 1 element array is already sorted
\fi%
}
\begin{document}
\let\temp\empty
\foreach \i in {1,...,10}{\pgfmathparse{rnd}\push{temp}{\pgfmathresult}}
\temp\newline
when sorted becomes\newline
\qsort\temp
\end{document}
My main problem is that due to \foreach
this macro is not expandable. So I can't make \edef\sorted{\qsort\data}
.
So my questions is : How can I define an expandable (quick) sort, working with arrays like {.1,10,.25}
?
Precision: I have put 'quick' in parentheses, because for me the most important is to be expandable. My other criteria are : to be, if possible, short and, if possible, fast.
Best Answer
This answer has
fourfivesixfour routines:the last update: leaner
code 6
, as I understand better my hacky use of\pdfescapestring
and could remove some superfluous extras (if they had been really needed, these extras would not have been enough anyhow).the very last update: improved sub-routine for merging in
code 6
bringing at least a2x
speed increase.the first one is the direct adaptation of the code to be found in the xint.pdf documentation since
2013/10/29
release; at the start of section7.28 The Quick Sort algorithm illustrated
there is an expandable implementation of the QuickSort algorithm. As it handled braced items, rather than comma separated ones, I needed to add conversion back and forth from comma separated values to braced items. This code usesxinttools
for list utilities andxintfrac
for comparison of "real" numbers.the second code treats directly the comma separated values. No need for package
xinttools
anymore. Still usesxintfrac
, and needs release1.2
due to using\xintdothis/\xintorthat
which in earlier releases exist under names containing_
only. It is more efficient than code version 1 also because it does less comparisons (code 1 does each comparison thrice (!!!), code 2 only twice).Code
1
and2
are kept only for sentimental reasons. The third code is a variant of code 2 which does each comparison only once, thus I expect it to be about twice faster. But for very long inputs, perhaps it has a penalty from longer token lists. Usesxintfrac 1.2
as code 2.the fourth code is exactly like code 3, but rather than
xintfrac
it assumes the numbers may be compared with\ifdim
. Thus this last code requires no package whatsoever. Needless to say it is faster too. But prone toDimension too large
error if the input does not fit.The previous update was to allow codes
2
,3
,4
to accept empty input (not empty values, only empty global input) gracefully.the fifth
and finalone handles alphabetical inputs either with\pdfstrcmp
(underpdflatex
) or with\strcmp
(underXeTeX
). For a change, this variant expands completely only in an\edef
(or\message
, etc..), a full expansion of the first token is not enough contrarily to what1
,2
,3
,4
offer. Empty or blank items in the input will be discarded from output, initial spaces are removed and each comma of the output has exactly one space after it. Final spaces of items (before commas) are left as they are (this will mean most of the time exactly one space token, as it requires special circumstances to create inputs with multiple space tokens in a row). It is not hard to modify the code to act like1
,2
,3
,4
and not require\edef
so as to allow items which are macros which one does not want to see expanded in the output.A hopefully final edit improves a bit the speed of code
5
as it exploits the fact that the blanks are gone after the first pass.The sixth one is an expandable implementation of the Merge Sort algorithm, in a bottom up variant. For expandability, I could have gone the top down way, but that would require (at least in a simple-minded approach) a count of items to separate in two parts, and then recursively. I opted for a bottom-up approach, which a priori causes grave problems of implementations in an expandable way (**), problems which are solved by a trick for propagating expansion which I learned from correspondance with Bruno Le Floch: the trick is to use the
pdftex
primitive\pdfescapestring
. In this manner I can make successive passes, first two by two, then four by four, then eight by eight, each time leaving behind the already treated material. Other techniques are available but this would populate irreversibly parts ofTeX
memory.There is an inconvenience which is that spaces end up escaped. Thus I suppress them in the first pass and the final output has no spaces contrarily to the other methods: they can be added in a final pass. I give the code for sorting decimal numbers comparable with
\ifdim
. One could usexintfrac
for arbitrary precision decimals or fractions. A first pass would apply\xintRaw
to each item, to convert scientific notation toxintfrac
A/B[N]
notation, because thee
would get catcode12
from\pdfescapestring
and on the next pass, this would not be recognized (alternative is to do the comparison in\xintthefloatexpr...\relax
where the catcode ofe
does not matter).This method needs explicit inputs, it will fail on
\a, \b, \c, \d
case.(**) not problems of principle, but problems of efficiency: it would be possible to keep all tokens upfront, especially as we essentially only have to grab each time two more delimited things. For this we could temporarily put the already parsed material beyond the new two things, merge them, and then swap back at the end of the parsed stuff (if we want to maintain a stable sort). This means lots of token shuffling, which is precisely the thing we want to minimize when possibly handling thousands of tokens.
I haven' much tested the sixth code, but it appears to be the fastest one. Almost
6x
faster thancode 4
on a test with 1000 decimal numbersabcd.wxyz
(after the improvement brought with last update).because my answer was exceeding the maximal size I have deleted code 1 and code 2 to make room for code 6
All quicksort variants pick each time the first element as pivot, which is bad if the input is already close to sorted (or reverse sorted).
The mergesort has no such issues.
code 1 (removed)
code 2 (removed)
code 3
code 4. No package needed, but limited to decimal numbers which one can compare with
\ifdim
.code 5. Again no package needed, this variant does alphabetical comparisons using
\pdfstrcmp
orXeTeX
's\strcmp
. This one requires an\edef
but one could change it to act like the other ones. Blank items from the input are gobbled and do not make it to the output. The output has one space after each comma preceeding an item.Code 6: Bottom-up Merge Sort using advanced techniques of expandable programming. Requires
pdftex
(or at least equivalent to\pdfescapestring
). This code for decimal numbers acceptable by\ifdim
. Can be extended to other comparison routines if desired.