[Tex/LaTex] Tex capacity exceeded (string characters) (well it’s about to be exceeded)

errorsimakeidxindexingmemoryxindy

As per my earlier question TeX capacity exceeded (save size), I am generating a fairly large index of .txt, .tex, and .pdf files. This process has worked for a long time, and the size of the index is not the issue. I have added logic to insert additional links in the index and now am about to exceed the string characters limit:

Here is how much of TeX's memory you used:
     134371 strings out of 493308
     9799992 string characters out of 9887816 
     4157963 words of memory out of 5000000
     59945 multiletter control sequences out of 15000+600000
     167812 words of font info for 205 fonts, out of 8000000 for 9000
     959 hyphenation exceptions out of 8191
     87i,12n,83p,10548b,16447s stack positions out of 5000i,500n,10000p,200000b,80000s

I have exceeded this several times in the past few months, and have optimized much of my code, but am running out of ideas as to where to further optimize. So, am trying to resolve this before I get stuck again, which seems imminent.

As a benchmark, processing a single file results in:

Here is how much of TeX's memory you used:
     52846 strings out of 493308
     1137494 string characters out of 9887816
     1523413 words of memory out of 5000000
     54776 multiletter control sequences out of 15000+600000
     167812 words of font info for 205 fonts, out of 8000000 for 9000
     959 hyphenation exceptions out of 8191
     87i,11n,83p,10548b,1042s stack positions out of 5000i,500n,10000p,200000b,80000s

Notes:

  • The string characters sometimes decreases slightly for no apparent reason. Since my indexing is processing a list of files, I can not see how adding another file to this list can possibly reduce the number of string characters required, but I have seen this happen.

  • I have already increased the pool_size in my /usr/local/texlive/2013/texmf.cnf:

    pool_size=10000000
    
  • Apologies for not providing a MWE. This process access my file system and to duplicate it would require setting up a bunch of files.

Questions:

  1. What are string characters?
  2. What kind of constructs should be avoided, and what kind are preferred to minimize using string characters ?
  3. Is there a way that I can see the number of string characters used in the middle of the run. If I could get this report before and after a macro invocation, I could possibly narrow down the main culprit.
  4. What does it take to increase this limit much further?

A solution to Question 4 would be ideal. I don't mind things taking longer. Currently takes about 40 minutes to generate the 600 page index, but even an overnight solution would be acceptable so lots of room left in terms of run time.

References:

Best Answer

It's the pool size: the letters used in command names, mostly. You can increase it in texmf.cnf usually (unless you reach the compiler limits)

% Max number of characters in all strings, including all error messages,
% help texts, font names, control sequences.  These values apply to TeX.
pool_size = 6250000

a simple test file (plain pdftex)

\tracingstats1
aaa

\def\a{aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
\def\b{\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a}
\count0=0
\loop
\advance\count0 1
\expandafter\def\csname \b \romannumeral\count0 \endcsname{}
\iftrue\repeat

This will keep making new csnames until it fills the pool, but it seems I can keep pushing it up in texmf.cnf

the effective pool seems to increase up to

pool_size = 100000000

which produces an error of

! TeX capacity exceeded, sorry [pool size=39921916].
 39921697 string characters out of 39921916

If I try to increase it further in texmf.cnf I get no warning but the reported size doesn't change. But that is approx 4 times bigger than the value you report.

Related Question