Update: I have revised my question based on the comment by Bruno Le Floch and some further research.
As some may know, I'm implementing a huge TeX macro package which has to do a lot of internal computations, and hence will give a lot less overall performance "per page" than, for instance, your usual LaTeX document.
Looking for quick wins in enhancing performance, I'm currently investigating what are the most performant ways to do some really basic computations in TeX.
One thing I'm using a lot, are array-like structures where large amounts of values are stored in main memory, accessible by one or several numeric keys.
So my question here is basically:
What is the best way to implement a one- or multi-dimensional array in TeX, concentrating on numerical access keys?
What I know so far
Currently, to store <value>
in array <myarray>
with key <key>
, I basically do
\expandafter\def\csname<myarray>@<key>@DocScape\endcsname{<value>}
(assuming <key>
is in a form which can go in a cs name).
Of course, this doesn't give a true random-access, compact array structure, but instead uses the internal hash table in the hope to get good performance for inserting a new array element and "almost" O(1) access to retrieve array elements.
Additionally, I can't imagine any other way of getting comparable performance with a TeX implementation.
Hence, for the moment I'm looking into ways of improving the performance of this basic technique.
Testing the raw hash performance
I've heard several times now that TeXs hashing algorithm for cs names is not so good, so one performance problem could be to get a lot of hash conflicts, leading to a lot worse performance than O(1), but I haven't looked into the sources, so I don't know in what way exactly it is bad. Also, I haven't found any precise information on this subject on this site.
If hash performance is bad, I can imagine the following things which may influence the overall performance of the array construct:
- Total length of cs name
- Length of a "constant" prefix before the "significant part" (i.e. the encoding of
<key>
). - The way
<key>
is encoded.
To test what form of constructing the cs name for array elements gives the best performance, I made the following test suite:
\input random
% Three different implementations of a "hash array" with numerical
% keys using different key encodings.
\input numhasharray.tex
\input hashnumhasharray.tex
\input codenumhasharray.tex
\newcount\iter
\newcount\rcount
\def\gobble#1{}
% A macro for setting up one experiment to test the "raw hash performance".
\def\mkhashexperiment#1#2#3#4#5%
{%
\iter0
\immediate\openout2=#1-setup.tex
\immediate\write2{\noexpand\message{starting test "#1"; size #2; prefix "#3";}}
\immediate\write2{\noexpand\message{postfix "#4"; coding \string\string\string#5}}%
\loop
\ifnum\iter<#2
\advance\iter1
\immediate\write2
{%
\noexpand\expandafter\noexpand\def
\noexpand\csname#3#5\iter#4\noexpand\endcsname{\number\iter}%
}%
\repeat
\immediate\closeout2 %
\iter0
\immediate\openout2=#1.tex
\immediate\write2{\expandafter\gobble\string\\input\space#1-setup}%
\immediate\write2{\expandafter\gobble\string\\newcount\string\mycount}%
\loop
\ifnum\iter<#20
\advance\iter1
\setrannum{\rcount}{1}{#2}
\immediate\write2
{%
% What is the most performant way to "just retrieve" the array
% value without causing any further computation cost?
\noexpand\mycount
\noexpand\csname#3#5\rcount#4\noexpand\endcsname\relax
}%
\repeat
\immediate\write2{\expandafter\gobble\string\\bye}%
\immediate\closeout2 %
\immediate\write1{time tex #1;}
}
% A macro for setting up one experiment to test the performance of an
% array implementation.
\def\mkarrayexperiment#1#2#3#4#5%
{%
\iter0
\immediate\openout2=#1-setup.tex
\immediate\write2{\noexpand\message{starting test "#1"; size #2;
array implementation "#3"}}%
\immediate\write2{\expandafter\gobble\string\\input\space#3}%
\loop
\ifnum\iter<#2
\advance\iter1
\immediate\write2
{%
\string#4{foo}{\number\iter}{\number\iter}%
}%
\repeat
\immediate\closeout2 %
\iter0
\immediate\openout2=#1.tex
\immediate\write2{\expandafter\gobble\string\\input\space#3}%
\immediate\write2{\expandafter\gobble\string\\input\space#1-setup}%
\immediate\write2{\expandafter\gobble\string\\newcount\string\mycount}%
\loop
\ifnum\iter<#20
\advance\iter1
\setrannum{\rcount}{1}{#2}
\immediate\write2
{%
% What is the most performant way to "just retrieve" the array
% value without causing any further computation cost?
\noexpand\mycount\string#5{foo}{\number\rcount}\relax
}%
\repeat
\immediate\write2{\expandafter\gobble\string\\bye}%
\immediate\closeout2 %
\immediate\write1{time tex #1;}
}
% Number of array entries to generate; 10x this number of random retrievals
% is generated.
\def\experimentsize{100000}
% Execute this sh script to run the tests.
\immediate\openout1=testhash.sh
\immediate\write1{(}
% Testing the performance of the internal hash table.
% The usual way of implementing an array: Just put the access key in
% the name directly as a \number.
\mkhashexperiment{testhash1}{\experimentsize}{pre}{}{\number}
\mkhashexperiment{testhash2}{\experimentsize}{}{post}{\number}
\mkhashexperiment{testhash3}{\experimentsize}{verylongprefixtostresshashtable}{}{\number}
\mkhashexperiment{testhash4}{\experimentsize}{}{verylongprefixtostresshashtable}{\number}
\mkhashexperiment{testhash5}{\experimentsize}{verylong}{prefixtostresshashtable}{\number}
\mkhashexperiment{testhash6}{\experimentsize}{verylongprefixto}{stresshashtable}{\number}
% Encoding provided by Bruno le Floch to optimise number hashing.
\mkhashexperiment{testhash19}{\experimentsize}{pre}{}{\hashnumber}
\mkhashexperiment{testhash20}{\experimentsize}{}{post}{\hashnumber}
\mkhashexperiment{testhash21}{\experimentsize}{verylongprefixtostresshashtable}{}{\hashnumber}
\mkhashexperiment{testhash22}{\experimentsize}{}{verylongprefixtostresshashtable}{\hashnumber}
\mkhashexperiment{testhash23}{\experimentsize}{verylong}{prefixtostresshashtable}{\hashnumber}
\mkhashexperiment{testhash24}{\experimentsize}{verylongprefixto}{stresshashtable}{\hashnumber}
% Best performance I found so far.
\mkhashexperiment{testhash25}{\experimentsize}{pre}{}{\mynumcode}
\mkhashexperiment{testhash26}{\experimentsize}{}{post}{\mynumcode}
\mkhashexperiment{testhash27}{\experimentsize}{verylongprefixtostresshashtable}{}{\mynumcode}
\mkhashexperiment{testhash28}{\experimentsize}{}{verylongprefixtostresshashtable}{\mynumcode}
\mkhashexperiment{testhash29}{\experimentsize}{verylong}{prefixtostresshashtable}{\mynumcode}
\mkhashexperiment{testhash30}{\experimentsize}{verylongprefixto}{stresshashtable}{\mynumcode}
% Testing the performance of different array implementations.
% Hash array with simple numerical keys.
\mkarrayexperiment{testnumhasharray}{\experimentsize}{numhasharray}{\numhashstore}{\numhashretrieve}
% Hash array with encoding provided by Bruno le Floch to optimise
% number hashing.
\mkarrayexperiment{testhashnumhasharray}{\experimentsize}{hashnumhasharray}{\hashnumhashstore}{\hashnumhashretrieve}
% Hash array with "hash spread" encoding.
\mkarrayexperiment{testcodenumhasharray}{\experimentsize}{codenumhasharray}{\codenumhashstore}{\codenumhashretrieve}
\immediate\write1{) \string&> testhash.log}
\immediate\closeout1 %
\bye
To generate the tests, you need the following implementation files for different hash-based array variants:
numhasharray.tex
% The most basic "hash array" for numeric keys:
% just use the number as a key.
\def\numhashstore#1#2#3{\expandafter\def\csname\number#2\string_#1\string_nh\endcsname{#3}}
\def\numhashretrieve#1#2{\csname\number#2\string_#1\string_nh\endcsname}
hashnumhasharray.tex
% This encoding is by Bruno Le Floch, directly constructed to optimise
% hash performance.
\def\step#1{#1---\step}
\def\endstep#1\step{}
\def\hashnumber#1{\expandafter\step \number#1 \endstep}
\def\hashnumhashstore#1#2#3{\expandafter\def\csname\hashnumber{#2}\string_#1\string_hnh\endcsname{#3}}
\def\hashnumhashretrieve#1#2{\csname\hashnumber{#2}\string_#1\string_hnh\endcsname}
codenumhasharray.tex
% My own encoding for numerical keys, hoping to spread out hash
% codes. In particular, I'm trying to get different cs name
% lengths. I'm indepted to Bruno Le Floch for the neat way of ending
% the recursion without an \if construct.
\expandafter\def\csname numkeya0\endcsname{a1Y@}
\expandafter\def\csname numkeya1\endcsname{b}
\expandafter\def\csname numkeya2\endcsname{c2}
\expandafter\def\csname numkeya3\endcsname{dZ}
\expandafter\def\csname numkeya4\endcsname{e3'}
\expandafter\def\csname numkeya5\endcsname{f}
\expandafter\def\csname numkeya6\endcsname{g4!}
\expandafter\def\csname numkeya7\endcsname{h}
\expandafter\def\csname numkeya8\endcsname{i5-}
\expandafter\def\csname numkeya9\endcsname{j"}
\expandafter\def\csname numkeya;\endcsname\myencodeb{}
\expandafter\def\csname numkeyb0\endcsname{k6}
\expandafter\def\csname numkeyb1\endcsname{l;}
\expandafter\def\csname numkeyb2\endcsname{m7/}
\expandafter\def\csname numkeyb3\endcsname{n}
\expandafter\def\csname numkeyb4\endcsname{o8}
\expandafter\def\csname numkeyb5\endcsname{p(:}
\expandafter\def\csname numkeyb6\endcsname{q9}
\expandafter\def\csname numkeyb7\endcsname{r}
\expandafter\def\csname numkeyb8\endcsname{s0(}
\expandafter\def\csname numkeyb9\endcsname{t,}
\expandafter\def\csname numkeyb;\endcsname\myencodec{}
\expandafter\def\csname numkeyc0\endcsname{uO}
\expandafter\def\csname numkeyc1\endcsname{v)}
\expandafter\def\csname numkeyc2\endcsname{wP}
\expandafter\def\csname numkeyc3\endcsname{x<}
\expandafter\def\csname numkeyc4\endcsname{yQ=}
\expandafter\def\csname numkeyc5\endcsname{z}
\expandafter\def\csname numkeyc6\endcsname{AR}
\expandafter\def\csname numkeyc7\endcsname{B?>}
\expandafter\def\csname numkeyc8\endcsname{CS}
\expandafter\def\csname numkeyc9\endcsname{D}
\expandafter\def\csname numkeyc;\endcsname\myencoded{}
\expandafter\def\csname numkeyd0\endcsname{ET[|}
\expandafter\def\csname numkeyd1\endcsname{F}
\expandafter\def\csname numkeyd2\endcsname{GU}
\expandafter\def\csname numkeyd3\endcsname{H]}
\expandafter\def\csname numkeyd4\endcsname{IV}
\expandafter\def\csname numkeyd5\endcsname{J}
\expandafter\def\csname numkeyd6\endcsname{KW*}
\expandafter\def\csname numkeyd7\endcsname{L}
\expandafter\def\csname numkeyd8\endcsname{MX}
\expandafter\def\csname numkeyd9\endcsname{N+}
\expandafter\def\csname numkeyd;\endcsname\myencodea{}
\def\mynumcode#1{\expandafter\myencodea\number#1;}
\def\myencodea#1{\csname numkeya#1\endcsname\myencodeb}
\def\myencodeb#1{\csname numkeyb#1\endcsname\myencodec}
\def\myencodec#1{\csname numkeyc#1\endcsname\myencoded}
\def\myencoded#1{\csname numkeyd#1\endcsname\myencodea}
\def\codenumhashstore#1#2#3{\expandafter\def\csname\mynumcode{#2}\string_#1\string_cnh\endcsname{#3}}
\def\codenumhashretrieve#1#2{\csname\mynumcode{#2}\string_#1\string_cnh\endcsname}
This will generate a couple of TeX files and a shell script to run a number of tests for comparing performance of different cs name constructs.
\mkhashexperiment{testhash1}{100000}{pre}{post}{\number}
generates a TeX file testhash1.tex
creating an an array construct with 100000
successively numbered entries and 1000000
random retrievals where the cs name is
\csname pre\number<key>post\endcsname
for <key>
s between 1
and 100000
.
Comparing the running times of the tests testhash1
to testhash6
above I get on my laptop (quoting from testhash.log
):
starting test "testhash1"; size 100000; prefix "pre";
postfix ""; coding \number) )
real 0m4.642s
user 0m4.296s
sys 0m0.068s
starting test "testhash2"; size 100000; prefix "";
postfix "post"; coding \number) )
real 0m3.827s
user 0m3.748s
sys 0m0.036s
starting test "testhash3"; size 100000; prefix "verylongprefixtostresshashtable
"; postfix ""; coding \number) )
real 0m16.614s
user 0m16.265s
sys 0m0.160s
starting test "testhash4"; size 100000; prefix "";
postfix "verylongprefixtostresshashtable"; coding \number) )
real 0m6.317s
user 0m6.176s
sys 0m0.072s
starting test "testhash5"; size 100000; prefix "verylong";
postfix "prefixtostresshashtable"; coding \number) )
real 0m8.971s
user 0m8.789s
sys 0m0.116s
starting test "testhash6"; size 100000; prefix "verylongprefixto";
postfix "stresshashtable"; coding \number) )
real 0m13.337s
user 0m12.753s
sys 0m0.112s
(Note that these are simple system timing results on my laptop computer, so they are probably slightly skewed by other processes on the system. Still the numbers are basically reproducible on repetition.)
The first (maybe surprising) result is that this amount of variation alone creates a huge difference in overall performance, with a difference of up to a factor 4 in running time. If it was possible to discount the fixed amount of additional processing (I used a counter assignment to access the array value), the difference would probably be much, much larger!
Now I heard that hash performance is particularly bad for sequences of digits, so \number
is possibly not a good idea for encoding the array access key.
Bruno Le Floch gave a comment in which he describes how to optimize number representation for hashing. This is implemented in the macro \hashnumber
and tested by the test cases testhash19
to testhash24
. The results:
starting test "testhash19"; size 100000; prefix "pre";
postfix ""; coding \hashnumber) )
real 0m2.395s
user 0m2.280s
sys 0m0.052s
starting test "testhash20"; size 100000; prefix "";
postfix "post"; coding \hashnumber) )
real 0m2.483s
user 0m2.260s
sys 0m0.048s
starting test "testhash21"; size 100000; prefix "verylongprefixtostresshashtabl
e"; postfix ""; coding \hashnumber) )
real 0m4.173s
user 0m3.740s
sys 0m0.056s
[snip]
This gives a significantly better result than when using \number
:
- The best performance is almost double that of
\number
(so 8 times better than the worst for\number
). - The "spread out" of performances between different cs name constructions is almost negligible; even the worst performance for
\hashnumber
is better than the best performance for\number
.
Of course, I'm intrigued how far this could still be enhanced by choosing a more clever encoding, but I have no clue how to go about this systematically.
In the hope to "spread out" hash keys for encoded numbers a bit more I made an encoding function myself (\mynumcode
; see test suite above). It will use a lot of different characters, produce different key lengths and avoid repetitions, which hopefully leads to a better distribution of hash keys.
Using this encoding, I get the following running times:
starting test "testhash25"; size 100000; prefix "pre";
postfix ""; coding \mynumcode) )
real 0m2.194s
user 0m1.920s
sys 0m0.016s
starting test "testhash26"; size 100000; prefix "";
postfix "post"; coding \mynumcode) )
real 0m2.089s
user 0m2.020s
sys 0m0.040s
starting test "testhash27"; size 100000; prefix "verylongprefixtostresshashtabl
e"; postfix ""; coding \mynumcode) )
real 0m3.594s
user 0m3.008s
sys 0m0.036s
[snip]
You see that the timing results are slightly better than for \hashnumber
, but not significantly so. So probably this is getting near optimal (for the hashing implementation of the TeX engine).
As the tests reported above are as near to testing the "raw performance" of the hash mechanism as conceivably possible (just storing and retrieving values; loading no additional macros and avoiding all computation), I dare to conclude the following:
- On the whole, storing array elements as separate macros with constructed cs names is feasible. Compared to the usual runtimes of TeX documents, the best given "raw" times for storing 100000 array elements and retrieving random elements for 1000000 times are OK and are probably completely dwarfed by any way in which these 1000000 values are processed further by the TeX system in a real application.
- Indeed, using
\number
to encode a numeric key is not optimal for array performance. - The solution given by Bruno Le Floch for achieving a "hash-optimised" number encoding assures a significantly better hash performance at small computational cost (see also below).
- The total length of the cs name doesn't seem to be significant.
- When using
\number
, inexplicably the overall performance of the array implementation is almost directly inversely proportional to the "static" prefix in front of the array access key (much less so for any other number encoding).
Comparison of engines
The test results cited above are with the original TeX engine. I ran the test suite shown above also with other engines.
pdftex
gives slightly less performance overall, but really insignificant.xetex
also gives slightly less performance overall, looking a little more significant (up to 1.5 times slower).
With luatex
the performance differences are a bit more subtle: cases which are fast with TeX are sometimes significantly slower (up to 2 times slower), while cases where TeX is slow are getting faster, so it seems LuaTeX is better at dealing with those cases (long static prefix) where the original engine is bad.
Does everything get slower?
One more concern about using the internal hash table to store arrays as thousands of cs names is that it will slow down all processing due to
- the fact that the size of the hash table has to be extended for really huge amounts of data, so looking up any cs name will take more time;
- the fact that the generated cs names "clutter up" the hash table, leading to hash conflicts for regular macro names as well.
I tested this by generating the above test suite with setting \def\experimentsize{1000000}
and using the following test file:
\documentclass{article}
%\input{testhash1-setup}
%\input{testhash25-setup}
\usepackage{pgfplots}
\let\rmdefault\sfdefault
\begin{document}
\newcommand\test[1]
{%
[can't repeat the example code here because of post length limit]
}
\test{1}
\test{2}
\test{3}
\test{4}
\test{5}
\test{6}
\end{document}
The XKCD plot example is taken from this answer. The intention for using a TikZ example is that TikZ loads a real lot of source code, defining and using heaps of macros, so if anything is wrong with hash performance for normal macro processing, this is one application where it should get most observable.
Note that the \input
statements (commented out above) are executed before
loading pgfplots
, to maximise potential hash conflicts when loading TikZ.
I had to extend the internal tables as follows to compile the example file with one of the \input
statements uncommented:
export hash_extra=1500000
export max_strings=2000000
export pool_size=30000000
First, let's look at the effect of the larger hash table. If I compile the example with both \input
statements commented out with time pdflatex
without any extension, I get the following result:
real 0m3.686s
user 0m3.628s
sys 0m0.044s
With the abovementioned extensions in place, I get
real 0m3.746s
user 0m3.684s
sys 0m0.052s
I would say this is a noticeable (also reproducible) slowdown, but totally insignificant for practical purposes.
Now, the timing result with \input{testhash1-setup}
uncommented:
real 0m19.760s
user 0m19.585s
sys 0m0.140s
and with \input{testhash25-setup}
uncommented:
real 0m10.710s
user 0m10.585s
sys 0m0.104s
And here are the timing results for "only" processing testhash1-setup
with the following test file:
\documentclass{article}
\input{testhash1-setup}
\begin{document}
\end{document}
(Note there will be a small overhead due to starting up LaTeX twice.)
real 0m15.401s
user 0m15.297s
sys 0m0.084s
And for testhash25-setup
:
real 0m4.218s
user 0m4.104s
sys 0m0.104s
Summing the run times of the separate computations and comparing with the joint computation, I get
- For
testhash1-setup
, the sum of both "separate" computation times (usr
timing) is 18.981s, while the joint computation takes 19.585s. - For
testhash25-setup
, the sum of both "separate" computation times is 7.788s, while the joint computation takes 10.585s.
Interestingly, this means that when the array key is encoded with \number
, the following computation is not slowed down significantly, while when using my faster encoding method, the slowdown is on the order of the original running time of the TikZ example!
So this example actually allows to observe the effect of hash table cluttering: As my encoding tries to spread the hash codes for array elements across the whole hash table, the number of hash conflicts for regular macro names increases and indeed the performance is halved.
The results are comparable for the encoding by Bruno Le Floch.
Note however that the experiment setting is really not very realistic, because loading the array code before TikZ comes down to setting up data structures before defining the actual document macros, while usually it will be the other way round: First, macro packages and suchlike are loaded and later during document processing, data is stored in arrays which are defined on the fly.
So what happens when I move the \input{testhash25-setup}
after the \usepackage{pgfplots}
?
real 0m8.928s
user 0m8.809s
sys 0m0.100s
So the slowdown is still noticeable, but much less than when executing the other way round.
Hence, my conclusions from these experiments are:
- Extending the internal hash table alone doesn't slow TeX down.
- Cluttering up the hash table with macros for array elements does slow TeX down noticeably.
- It makes a difference when macros are defined; the basic packages should come first and the definition of array elements should come as late as possible.
- When deciding how to encode array indices, all effects should be taken into account. It makes no sense to trade slightly better hash performance for vastly slower overall run time.
We haven't even talked about arrays yet…
Indeed, so far all tests were at the "raw hash" level, just executing the pre-expanded store and retrieval commands. In a realistic application setting, of course the encoding of array keys has to take place at execution time.
That's the purpose of the macro \mkarrayexperiment
in the test suite above: it will generate a test file containing statements like
\numhashstore{foo}{1}{1}
\mycount \numhashretrieve{foo}{438549}\relax
storing and retrieving values to/from an array foo
with the access macros defined above.
From the given test suite, I get the following results:
starting test "testnumhasharray"; size 100000; array implementation "numhasharr
ay" (./numhasharray.tex)) )
real 0m5.472s
user 0m5.192s
sys 0m0.048s
starting test "testhashnumhasharray"; size 100000; array implementation "hashnu
mhasharray" (./hashnumhasharray.tex)) )
real 0m3.754s
user 0m3.664s
sys 0m0.032s
starting test "testcodenumhasharray"; size 100000; array implementation "codenu
mhasharray" (./codenumhasharray.tex)) )
real 0m5.114s
user 0m4.868s
sys 0m0.048s
Here, we can see that the effort for encoding the array keys diminishes the distance between using \number
and the hash-optimized approach of Bruno Le Floch. My own encoding is disqualified because the effort for generating the array keys outweighs the added hash efficience 🙁
But the good new is that even taking the time needed for encoding into account, the approach of Bruno Le Floch beats the array index based on \number
.
Are there other approaches?
I'm interested to hear whether there are other approaches to implementing an array apart from the hash-based one studied here which have comparable performance.
The l3prop
package offers an implementation of "property lists" which could be accessed like arrays. But with a test similar to the above, I got a performance which is totally incomparable to the hash-based implementation (100 times slower for 1000 array entries and 10000 retrievals; might even get worse for larger array sizes). So this doesn't seem to be an alternative for storing huge amounts of data.
I tried to devise a simple macro-based solution where all entries are stored in a single macro expansion, but unfortunately I couldn't think of any way for retrieving the value of an array entry with any kind of efficiency (I thought delimited arguments could be used for that, but couldn't figure out how).
But let me speculate a little: Let's talk about n array entries with a total content length of m. Conceivably, m>>n.
Then all solutions for storing the n entries I can imagine fall into two categories:
- New cs names on the order of n are created. This is the solution sketched above, but also, for instance, all kinds of dictionary implementations where nodes of a tree are stored as single macros.
- At least one macro whose expansion text is on the order of m is created.
Clearly, no solution based on the first approach will be more efficient than the simple implementation given above, because that one needs exactly one access to the internal hash table, and you can't go lower than that.
But I think that also no solution based on the second approach can be more efficient, because however you implement it, to retrieve an array element, you'll have to expand a macro whose expansion text is on the order of m, and that is clearly slower than a single access to the internal hash table, so there.
But maybe I'm just lacking imagination, so feel free to present a third solution, or in fact any array implementation more efficient than the simple hash-based one.
Final questions
Based on what I know so far, my basic question can be refined as follows:
- Looking at the actual implementation of TeXs hashing algorithm, what is the best way to construct cs names for maximum array performance?
- Is there any fundamentally different method of implementing an array in TeX which gives comparable (or even better) performance than the hash-based one?
- Am I somehow looking into the wrong direction? Maybe my experiment setup is somehow fundamentally flawed and I'm getting insignificant results? Should I test differently?
- I used the original TeX engine for my tests, and am not using any construct specific to any TeX extension. I tested the very same code with other engines, not giving any unexpected or significantly different results. It is rather clear that LuaTeX offers to implement arrays in Lua, hence giving completely different options. Are the other engines (
pdftex
, XeTeX) offering anything which might aid in implementing arrays?
Best Answer
So, first let me say that this is probably the most thoroughly researched question I've read so far. Congratulations.
I'll be reusing some of David's comments. The hashing algorithm takes the csname as an array of bytes (probably some differences there for XeTeX and LuaTeX, let me focus on 8-bit engines) and computes the sum of
csname[i]*2^(len-i) % prime
wherecsname[i]
denotes thei
-th byte in the csname,len
is the total number of characters, and the result is computed modulo the hashprime
. The control sequence is then stored or retrieved at the resulting position in the hash table. If there was never any hash collision, then storage and retrieval would be easy and fast, since computing the hash is very fast (additions only). When there is a collision, TeX must check whether the command that it found in the hash table is indeed the csname that was searched for: this is done by comparing strings one character at a time, and is somewhat slower than the hashing algorithm (I think). TeX may need to compare all csnames with the same hash (it can stop once it finds the csname it was looking for).When comparing strings which happen to be different but have the same hash, TeX goes through them one character at a time, and can stop comparing as soon as it finds a difference: there lies the difference between a long prefix and a long suffix. In scenarios where there are many hash collisions (e.g., with
\number
), most of the time is spent comparing strings which happen to have the same hash. If such strings differ on an early character (i.e., short prefix), then they will be recognized as different pretty fast. If they have long common prefixes, then TeX will have to scan them for a long time (until the first difference) before it can declare them distinct.Why is
\number
bad for hash collisions? Say that your array keys are 5-digit numbers. Compared to the hash of00000
, the hash ofabcde
(wherea
, etc, stand for digits) is16*a+8*b+4*c+2*d+e
. For instance,98765
has the same hashcode as00000
plus16*9+8*8+4*7+2*6+5=253
. This is tiny. In fact, hash codes for all sets of five digits all lie among 280 possible values! A typical hash value corresponds to more than 300 csnames, and on average TeX will have to test half of them before finding the one it wants.How do we spread out hashes? The main culprit in having only 280 different values with
\number
is that Knuth chose a multiplier equal to 2 only. This is justified when names are more or less random sequences of letters of varying length, as in typical TeX documents (adding one character multiplies the whole hash value by 2, so collisions between strings of different lengths are not particularly likely regardless of their internal structure). To choose my encoding, I decided to replace16*a+8*b+4*c+2*d+e
above by16^4 * a + 16^3 * b + 16^2 * c + 16 * d + e
. This is simply done by inserting three dummy characters between each decimal digit, turning98765
into9---8---7---6---5
. Why insert three rather than more? Because 16 is the smallest power of 2 greater than 10. Choosing 8 instead of 16 would still lead to hash collisions: the hashes of0--0--0--0--0
and9--9--9--9--9
are only9*8^4+9*8^3+...+9=42129
apart, so a typical hash would correspond to two or three csnames.The last step was simply to convert this to code:
Here I use a typical technique to optimize loops: define the looping macro to do what you want and call itself again (here we want
#1->#1---
), then figure out what to feed as#1
to break the loop. This avoids any test hence is as fast as can be while supporting an arbitrary length of input. Note also that I am putting dashes after the digit (#1---
instead of---#1
) in an effort to put relevant stuff early in the csname.For the specific application of inserting stuff within numbers, we might be able to do better (haven't benchmarked): numbers will not have more than 10 digits (actually 6 might be a reasonable bound, so remove
#6
through#9
). The following should work and may be faster(at least 8 stars after the number, to support numbers of arbitrary lengths between 1 and 10).
Yet another option is to use
\romannumeral
on each digit, which reduces hash collisions because lengths vary. I don't know how this fares.One thing is certain: csnames of the form
<prefix><varying><postfix>
of a constant length can have256*2^(len-1)
different hashes (wherelen
is the length of the varying part), and more realistically10*2^(len-1)
hashes only. Getting10^5
distinct hashes absolutely requires 10 varying bytes, or some variation on the length of the prefix, varying or postfix parts. In practical settings (where key encoding is taken into account, hence where it is difficult to get more than 10 different leading bytes), one needs about 15 bytes of varying material. My\step
approach uses 20 (or 17, depending on whether you count the trailing---
). My\dashes
approach may use various numbers depending on whether it is tailored to a specific number length.Here are ways to implement arrays, depending on your use case. First, let us focus on arrays which can contain arbitrary tokens.
\toks
registers. If you are defining and using your array in a controlled environment, use\toks
registers within a group. This is limited to 32768 registers (65536 for LuaTeX, I think). Access is faster than csnames, and there is no need for tricks. In fact, this technique can also be used if your array is defined somewhere else in another form, but you need to access it a lot at a given place (within controlled code). This (is|has been) used inl3sort
for merge sort.Multiple
\csname
approach. This is what you do, and has been discussed at length here. There is no limit in the number of slots, and keys may be other than numeric, but for numeric keys some trickery is needed to avoid hash collisions. Also, you cannot have many such arrays in memory at the same time lest you fill your hash table completely.A single
\csname
containing everything. This is essentially whatl3prop
does at the time of writing. Mapping through all keys is straightforward and fast (per element). Accessing individual elements in an arbitrary order will be very slow, since the whole array needs to be expanded from a macro every time we want to access one element.Hybrid approaches may be better if your access to given items happens in clusters (e.g., you want to look at keys 1–2000, then 1000–3000, then 2000–4000): you could allocate a certain range of
\toks
registers for this particular purpose and update once in a while which items are stored in this range. You could also store most stuff in a single csname and only send some items to individual csnames, or toks registers.Another type of array has numeric values, be it integers, dimensions, and the like.
Various registers can be used in the same way as
\toks
registers above. A typical example of this isl3regex
, which at the time of writing abusestoks
,dim
,skip
, andmuskip
registers extensively to store all sorts of information. Do not use integer registers, as stuff like\m@ne
would break, but expect other stuff to break too (\maxdimen
,...). Using dimension or skip or muskip registers can get messy when there is a need to convert from one to the other: of course, we use stretch and shrink components too, which leads to\number\glueshrink\mutoglue\muskip3\relax
-type constructions.A more practical set of numbers is given by the
\fontdimen
parameters of a font. In effect, those behave like a usual C array whose size must be declared before starting to use it, and whose items are TeX dimens, i.e., integers less than 2^30 in magnitude. All assignments are global. Retrieval speed is comparable to other registers, perhaps a bit slower. See a question I asked on this topic for some other comments.Of course, the
toks
, andcsname
approaches described earlier work fine for numeric data too, but it seems inefficient compared to\fontdimen
, especially concerning memory usage (each\fontdimen
parameter uses just a few bytes of memory, versus many bytes for the csname and many more for tokens representing the value).On a similar topic of optimizing data structures:
A long string of characters is stored more efficiently as a csname than as a list of tokens. This has been used at some point in
l3fp
, storing 10000 decimals of1/(2pi)
as\159154943091895...
, to be passed to\string
when needed.Trees which only contain numeric data and only need to be accessed non-expandably can be stored and manipulated as nested boxes, with data stored as various kern nodes for instance. I played with this to implement sorting algorithms some time ago, but the code was messier than the current
l3sort
and not significantly faster. I'll look at it again some day. It might be possible to reuse this for the relationship between document elements.