[Tex/LaTex] A LaTeX log analyzer application (visualizing TeX expansion)

debuggingexpansionloggingmacrostex-core

This question led to a new package:
unravel

Consider the following MWE, test.tex:

\documentclass[12pt]{article}

\begin{document}

\tracingassigns=1
\tracingmacros=1

\def\aaa{something}
\def\bbb{else \aaa, else}
\edef\ccc{third \bbb, level}

\tracingassigns=0
\tracingmacros=0

\end{document}

If you build this with pdflatex test.tex – then in the logfile, test.log, you get something like this (linebreaks added for legibility):

{into \tracingassigns=1}
{changing \tracingmacros=0}
{into \tracingmacros=1}


{changing \aaa=undefined}
{into \aaa=macro:->something}

{changing \bbb=undefined}
{into \bbb=macro:->else \aaa , else}

\bbb ->else \aaa , else

\aaa ->something

{changing \ccc=undefined}
{into \ccc=macro:->third else something, else, le\ETC.}


{changing \tracingassigns=1}

Now, this explains the expansion steps done by (La)Tex quite well for this short example – unfortunately, it becomes extremely hard to read (for me) once you have to deal with possibly hundreds of these expansions, some maybe dealing with typesetting procedures.

So I was thinking – it shouldn't be too extremely difficult to build an application, which would basically read the logfile line by line, and allow for "stepping" through the logfile; I'd imagine rightarrow keyboard key -> would step you forward through the log, and leftarrow key <- would step backwards; possibly, one could specify line number of the logfile as a starting point as well.

Then, the application would simply react on '^{changing', '^{into', and possibly '^\\(.*)->(.*)'; and would display the line, as well as the "current" token elsewhere on screen; so at the "changing" line, the extra portion of the screen would say \aaa=undefined; and upon "into" line, the snippet would change to \aaa=macro:->something.

I think just this facility would make visualizing and understanding the (La)Tex expansion process much more easy (especially in "real" documents). And in fact, such an application doesn't even need a full-blown GUI – I'd imagine a ncurses terminal application would do just as well (problems with display of long strings in limited width terminal notwithstanding).

So, I was wondering – is there any application similar to this out there?

Best Answer

EDIT: This answer led to (in fact, is) a package: the unravel package, on GitHub. It relies on the gtl package, so if you want to help me by testing, you'll need to grab this as well. Both packages are written using the LaTeX3 programming language provided by the expl3 package (l3kernel) and some extensions in l3experimental.

I have now implemented most of TeX's primitives. The parts missing are math mode, tables (\halign etc.), discretionaries (including \-), the output routine, \aftergroup, \letterspacefont, \pdfcopyfont, \pdfprimitive, and all XeTeX and LuaTeX goodies. Unavoidably also, category codes are fixed when files are opened the first time, \outer macros will break the package, and begin-group and end-group characters other than left and right braces will cause trouble. Despite all those restrictions, \unravel{\documentclass{article}\relax} will show you all the nitty gritty details of what TeX does when going through article.cls (and before this, all the work that comes into deciding whether or not a file is worth reading). Beware: at full speed, this takes several minutes and 20000 steps.

As of now, the unravel package only provides a very simple interface to monitor TeX's activities while it is going through the expansion and typesetting process. One can only go forward. Let us give an example of use. Put the following code in a file, say filename.tex, and run pdflatex filename.tex in a terminal.

\documentclass{article}
\usepackage{unravel}
\AtEndDocument{\message{Bye!}}
\begin{document}
\end{document}

After a small welcome message, \unravel will wait for your input. Either go through steps one at a time, by pressing the key enter, or type s20o1 ("*s*croll 20 steps but still *o*utput") then enter. In the latter case, the output is (similar to, depending on the version) what follows.

======== Welcome to the unravel package ========
"<|" denotes the output to TeX's stomach.
"||" denotes tokens waiting to be used.
"|>" denotes tokens that we will act on.
Press <enter> to continue; 'h' <enter> for help.

|> \AtEndDocument {\message {Bye!}}
s20o1
[===== Step 1 =====] 
\AtEndDocument = macro:->\g@addto@macro \@enddocumenthook .
|> \g@addto@macro \@enddocumenthook {\message {Bye!}}
[===== Step 2 =====] 
\g@addto@macro = \long macro:#1#2->\begingroup \toks@ \expandafter {#1#2}\xdef
#1{\the \toks@ }\endgroup .
|> \begingroup \toks@ \expandafter {\@enddocumenthook \message {Bye!}}\xdef
|> \@enddocumenthook {\the \toks@ }\endgroup 
[===== Step 3 =====] \begingroup = \begingroup.
<| \begingroup 
|> \toks@ \expandafter {\@enddocumenthook \message {Bye!}}\xdef
|> \@enddocumenthook {\the \toks@ }\endgroup 
[===== Step 4 =====] \toks@ = \toks0.
<| \begingroup 
|| \toks@ 
|> \expandafter {\@enddocumenthook \message {Bye!}}\xdef \@enddocumenthook
|> {\the \toks@ }\endgroup 
[===== Step 5 =====] \expandafter {.
<| \begingroup 
|| \toks@ \expandafter {
|> \@enddocumenthook \message {Bye!}}\xdef \@enddocumenthook {\the \toks@
|> }\endgroup 
[===== Step 6 =====] \@enddocumenthook = macro:->.
<| \begingroup 
|| \toks@ \expandafter {
|> \message {Bye!}}\xdef \@enddocumenthook {\the \toks@ }\endgroup 
[===== Step 7 =====] back_input: \expandafter {.
<| \begingroup 
|| \toks@ 
|> {\message {Bye!}}\xdef \@enddocumenthook {\the \toks@ }\endgroup 
[===== Step 8 =====] Set \toks@(\toks0)=\message {Bye!}.
<| \begingroup 
|> \xdef \@enddocumenthook {\the \toks@ }\endgroup 
[===== Step 9 =====] \xdef .
<| \begingroup 
|| \xdef 
|> \@enddocumenthook {\the \toks@ }\endgroup 
[===== Step 10 =====] \@enddocumenthook .
<| \begingroup 
|| \xdef \@enddocumenthook 
|> {\the \toks@ }\endgroup 
[===== Step 11 =====] {.
<| \begingroup 
|| \xdef \@enddocumenthook {
|> \the \toks@ }\endgroup 
[===== Step 12 =====] \the .
<| \begingroup 
|| \xdef \@enddocumenthook {\the 
|> \toks@ }\endgroup 
[===== Step 13 =====] \toks@ .
<| \begingroup 
|| \xdef \@enddocumenthook {\the \toks@ 
|> }\endgroup 
[===== Step 14 =====] \message {Bye!}.
<| \begingroup 
|| \xdef \@enddocumenthook {\message {Bye!}
|> }\endgroup 
[===== Step 15 =====] }.
<| \begingroup 
|| \xdef \@enddocumenthook {\message {Bye!}}
|> \endgroup 
[===== Step 16 =====] Set \@enddocumenthook=macro:->\message {Bye!}.
<| \begingroup 
|> \endgroup 
[===== Step 17 =====] \endgroup = \endgroup.
<| \begingroup \endgroup 
|> 
<| \begingroup \endgroup 
|> 

Step 17 was the last!

Of course, the effect of the \AtEndDocument command takes place, and there is a message "Bye!" at the end of the compilation


This example did not involve any complicated expansion. For more fun in this direction, try to understand how the l3fp expression parsing works...

\documentclass{article}
\usepackage{unravel}
\begin{document}
\ExplSyntaxOn
\unravel { \fp_eval:n { sin(2pi/3) } }
\ExplSyntaxOff
\end{document}

After some steps, I get the following shown on my terminal:

[===== Step 3334 =====] \exp_after:wN \__fp_pack:NNNNNwn 
|| \exp_after:wN \__fp_to_decimal_dispatch:w 
|| \tex_romannumeral:D -48
|| \tex_romannumeral:D 
|| \exp_after:wN \__fp_parse_after:ww 
|| \tex_romannumeral:D -48
|| \exp_after:wN \__fp_parse_until_test:NwN 
|| \exp_after:wN \c_minus_one 
|| \tex_romannumeral:D -48
|| \exp_after:wN \__fp_fixed_mul_after:wn 
|| \int_use:N 
|| \__int_eval:w -50000
|| \exp_after:wN \__fp_pack:NNNNNwn 
|| \int_use:N 
|| \__int_eval:w 499950000+05235*05235
|| \exp_after:wN \__fp_pack:NNNNNwn 
|| \int_use:N 
|| \__int_eval:w 499950000+05235*9877+9877*05235
|| \exp_after:wN \__fp_pack:NNNNNwn 
|| \int_use:N 
|| \__int_eval:w 499950000+05235*5598+9877*9877+5598*05235
|| \exp_after:wN \__fp_pack:NNNNNwn 
|| \int_use:N 
|| \__int_eval:w 499950000+05235*2990+9877*5598+5598*9877+2990*05235
|| \exp_after:wN \__fp_pack:NNNNNwn 
|> \int_use:N \__int_eval:w \c__fp_trailing_shift_int
|> +9877*2990+5598*5598+2990*9877+(5598*2990+2990*5598+\__fp_fixed_mul:nnnnnnnw
n
|> {0000}{0000}{05235}{9877}{05235}{9877}{0000}{0000};{\exp_after:wN
|> \__fp_sin_series_aux:NNnww \exp_after:wN \__fp_fixed_to_float:wN
|> \__int_value:w \if_int_odd... (223 chars)

(I obtained this by pressing s3333 then enter twice). What does all this mean? Well, at some point during the expansion of \fp_eval:n { sin(2pi/3) }, TeX found \exp_after:wN (the LaTeX3 name for \expandafter), kept the next token, \__fp_to_decimal_dispatch:w for an expansion later, and expanded what follows. What follows was \tex_romannumeral:D (next token in the part of the screen marked with ||), which made TeX look for a number. After expanding some tokens further, TeX found the (incomplete) number -`0 (equal to -48) which made it expand further, and so on. The snapshot shown on the screen is taken at a time where TeX is 24 level deep in such nested expansions (all the tokens in the || part), and is going towards a 10th level since the last \exp_after:wN of the || part, after jumping over \__fp_fixed_mul_after:wn, hits \int_use:N (aka \the). Needless to say, macros to perform floating point computations are a bit hard to follow.

In general, there can be three parts: |> denote tokens on input, that TeX has not yet seen, or that have been reinserted for instance after a macro expansion; || denote tokens that are stored for later; <| denote commands that have reached TeX's main loop (i.e., have gone through the whole machinery of expansion) and have an impact on typesetting. Definitions are performed right away.

Implementing unravel was and is of course difficult, and I ended up having to very often rely on the TeX source code to find out how Knuth did various things. In particular, the nesting of conditionals is a nightmare. Those who know the source of TeX can probably recognise large traces of the influence in names such as \@@_scan_int: or \@@_get_x_next:. Also, each primitive is given a command code and a character code, whose values follow precisely those of pdfTeX.

One direction I would like to explore is to output all the data to an XML file (or other) which could then be processed through various other tools, perhaps giving more interactivity. Another interesting aspect would be to produce typeset content rather than on-screen diagnostic. This would allow to differentiate more visually, for instance, between the pieces of the || region, and it would allow to also differentiate category codes through color, which can be important in some debugging tasks. In the direction of producing less steps, I am wondering if giving a regular expression and performing silently expansions which pertain to commands matching with the regular expression would provide a useful filter.