[Tex/LaTex] Compile extremely large file into pdf

compilingmemorypdftexperformance

I am using Django and it's templating system to dynamically generate Latex and then compiling it with pdflatex, which works great.

The problem is when the latex that gets generated is 30,000+ lines long and the compilation takes 2+ seconds, not actually outputting proper PDF files (i.e. tables are missing) and sometimes just crashing pdflatex altogether (error message is: run out of memory).

Is there a way I can improve the performance and/or avoid compilation crashes?

There is nothing wrong with my latex, because when I scale down the generated latex size everything works fine. It's only when the quantity increases that problems start to arise.

I have done some research and the majority of the answers seem to talk about tricks for fast compilation with plotting and compiling only parts of the document. These tactics don't apply to my situation, as I am not plotting (it's just massive chunks of text, table rows to be more specific) and I can't compile sections, because the majority of the content is dynamically generated each time.

Edit 1:

@David Carlisle
I am using "tabulary" to generate the tables, as it's the only way I could get the columns to automatically adjust their widths as appropriately as possible.
Yeah I would have though the memory footprint would be small still, but it seems to be massive.

@egreg and @David Carlisle
For one where the Latex is around 30,000 lines long I don't get any error messages (log file or console), but I do get heaps of "Overfull/Underfull hbox" warnings (which I think are harmless).

As for a Latex file that causes pdflatex to crash, the error I get is:

Runaway definition?                                                          
->\global \let                                                               
! TeX capacity exceeded, sorry [main memory size=7000000].                   
\CT@setup ...\CT@color {\global \let \CT@do@color                            
                                                  \CT@@do@color \color }     
l.94583 \end{tabulary}                                                       

If you really absolutely need more capacity,                                 
you can ask a wizard to enlarge me.                                          


Here is how much of TeX's memory you used:                                   
 1708 strings out of 495028                                                  
 23123 string characters out of 6181498                                      
 7000001 words of memory out of 7000000                                      
 4949 multiletter control sequences out of 15000+600000                      
 4782 words of font info for 16 fonts, out of 8000000 for 9000               
 14 hyphenation exceptions out of 8191                                       
 34i,8n,32p,232b,270s stack positions out of 5000i,500n,10000p,200000b,80000s
!  ==> Fatal error occurred, no output PDF file produced!

Edit 2:

I know essentially nothing about the internals of how Latex is implemented, so I provided my answer based on knowledge of computer science and software development. From my perspective using tables with this amount of sporadic data is susceptible to the poor performance (leaky abstraction) of the implementation. Hence why I have suggested to avoid using tables.

The reason why there are 10,000+ rows is because the text/data is for a computerised maintenance management system. The documents are not to be read by a wide audience and are not needed to be overly "pretty", they just need to be legible. Hence why I have suggested to avoid using tables.

Yes, every line will be of different shape, which might add to the explanation of why I am having performance problems.

Best Answer

TeX has no problem with memory limits and can print ten million lines or 217,392 pages on my laptop with the following code:

\documentclass{book}
\usepackage{luatexbase, luacode}

\begin{document}

\begin{luacode}
for i=1,10000000 do
   tex.print('x','y','z', '\\par')
end   
\end{luacode}

tests
\end{document}

Actually the limit of ten million lines was arbitrary and I am sure you can double this. You are only limited by the system's ability to handle large log files and pdfs. Try it first with one zero less in the loop.

As long as you issue paragraphs or page breaks TeX will cope with any amount of output, provided your system can handle the pdf output.

So my advice is to build the table line, by line and issue it as paragraphs (more or less that is how longtable works.

I have made the example as LuaLaTeX to help you with the loop. It will perform better with plain TeX and a simple \loop..\repeat macro.