[Tex/LaTex] Why is biber so slow

biberbiblatex

I have a simple document, about 20 pages long with some mathematics and a couple of tikz diagrams. I use biblatex and biber (0.9.9 MacTeX 2011) to compile my references, of which there are currently 2, with maybe 10 citations.

When using the bibtex backend the bibliography stage of my compilation takes under a second. Biber takes over 5 seconds to process exactly the same file.

I use TeXShop and with biber the console window appears but none of biber's command line output appears for at least 4 seconds.

Is there a problem with my setup or is biber slow by design?

EDIT: yes, it is slow every time. I've done some digging into the folders where biber unpacks its perl dependencies and I think TeXShop might be unpacking it every time. Perhaps something deletes the unpacked binary after each use.

The way I've made TeXShop use biber is to change the bibTeX engine field in the preferences to `biber'.

Best Answer

It is slower than bibtex which is in C, even if you take into consideration the first run unpacking. Bear in mind that biber does a lot more than bibtex too. They are hardly comparable in functionality at all. Your tikz and maths should make no difference to biber. If your cache is getting deleted every time you run, this will make a huge difference. Easy to check this - delete the cache and run. Is the second biber run any faster?

The main overhead is sorting. It is a complex business, dealing with much more than bibtex - Unicode 7.0, direction per-field, case per field ... Next overhead is uniqueness processing. Again, complex. Bibtex probably does about 20% of what biber does. See the biber PDF manual to get a sense of its share of the biblatex work.

As of version 2.5 (currently in DEV), I have done some profiling with NYTProf. The majority of bibers time is spent inside the Unicode::Collate module (written in C), as one would expect as sorting is a main focus and it's expensive to do tailored UCA sorting (which bibtex doesn't even come close to doing). After some examining of the call stacks, I've done some loop tidying for sorting calls and now biber 2.5 is about four times as fast as 2.4 and probably all earlier versions.

As mentioned in the doc, for performance testing, I use a 2150 entry, 15,000 line .bib file which references a 630 entry macro file with a resulting 160 or so page bibliography. In biber 2.4 this takes about 2 minutes to process. In the current 2.5 development version it takes about 28 seconds. This is almost the same now as when using the --fastsort option which doesn't use Unicode collation (so I may drop --fastsort since it is functionally far less useful and if there is no performance benefit, there is no longer any point in it).

Related Question