Background
I have been busy with a book about the worlds scripts and languages on and off for a couple of years. It covers all the listed scripts in unicode, as well as a dozen or so, where there is no unicode standard yet and I use images instead of fonts.
With Noto fonts, unicode LuaLaTeX and l3 maturing, I have been able to print a reasonable range for all scripts, as needed in the write up. With the exception of East Asian scripts, that I only have a few pages only per script. I use as a main font Brill and I added fallback fonts to cover the rest of the scripts. Book so far, hovers around 350 pages and I anticipate that it will run to a final size of 600 pages. To cover the unicode points the fonts need to provide +-150,000 glyphs. Not all are codepoints are used in the book, as I mentioned earlier, in my estimation I only need about half of that. Obviously and understandably compilation speed is an issue, so I am looking to understand the algorithm used by luaotfload-fallback.lua ato try and see if I can improve the processing time. I am looking at strategies to optimize compilation times, not only for my document, but in general.
I have identified bottlenecks in mainly three areas a) fonts b) images c) logging (disk writes in general). Images I will use a preproceesor and optimize all of them as well as produce pdfs. I will use Golang for the preprocessor, which can also do marking if needed. Ideas for fonts and logging see below.
-
I have this (crazy) idea that the glyph info required at the nodes during processing, be obtained via a local server, so some tasks can be externalized and run with concurrency. I am thinking of some form of priority que, so that data for codepoints used frequently can be served fast and any unused codepoints on a second run, be taken out of the cache. Again here I will use Golang and sqlite3 since everything is local. I have a Lua table at the moment, which maps unicode points to fonts, based on a config file.
-
All logging to be also sent to a server rather than written on disk. Can also be done for the aux files.
-
The generation of the pdf also takes time, but I am undecided at this point if it can be optimized.
Current compilation speed is about 1.3 seconds per page + an initial of 30-40 seconds.
Question
Can someone explain to me, the algorithmic steps in luaotfload-fallback.lua? When and how is this used by LuaTeX when building a document? At which point are the glyph info needed? Any ideas welcome. Thank you for reading this far.
Best Answer
This doesn't answer the question in the question title at all, but I think that this addresses the issues presented in the question body (hopefully).
Indirect Answer
Here's a solution that loads 231 unique fonts and prints 83 020 unique characters (103 pages) in 7.505 seconds (on average) using LuaLaTeX.
First, run this script to download all the fonts:
Then, place the following in
all-characters.lua
:Then, you can print all the characters using the following document:
ConTeXt is 50% faster at 4.849 seconds on average:
More usefully, this also defines a virtual font
\allcharactersfont
that contains characters from all the loaded fonts:Direct Answer
The document below loads all 231 fonts in 2.426 seconds on average, so there's not much room to speed up the font loading.
If you did still want to speed it up, the easiest way would be to place the font files and
luaotfload
caches in a RAM disk.Aside from some package initialization spam and overfull box warnings, your document shouldn't be producing that much log output. If you do have that much output, then I'd try and reduce the amount of output rather than trying to optimize it.
Disabling PDF compression can help a little, but 1.3 seconds per page suggests that something else is going on.
Another common issue is complicated TikZ figures, so if you're drawing any glyphs with TikZ then you should externalize and cache them.
Loading images can also be slow, so if you're loading a bunch of characters as individual files, then it's quite a bit faster to combine them all into a single PDF file and select the character by page number. pdfTeX (and maybe LuaTeX too?) closes each opened PDF file after every page, so it's much faster to load all the pages/characters into individual boxes at the start of each run than it is to reload the PDF file each time. (Or better yet, see the suggestion below.)
If you have the character images available as SVG files, then my (unreleased/experimental)
unnamed-emoji
package solves almost this exact problem. There's a little bit of end-user documentation, but for actually building the “font” files you'll need to use theMakefile
as a rough guide.