[Tex/LaTex] A LaTeX log analyzer application (visualizing TeX expansion)

debuggingexpansionloggingmacrostex-core

This question led to a new package:
unravel

Consider the following MWE, test.tex:

\documentclass[12pt]{article}

\begin{document}

\tracingassigns=1
\tracingmacros=1

\def\aaa{something}
\def\bbb{else \aaa, else}
\edef\ccc{third \bbb, level}

\tracingassigns=0
\tracingmacros=0

\end{document}

If you build this with pdflatex test.tex – then in the logfile, test.log, you get something like this (_{linebreaks added for legibility}):

{into \tracingassigns=1}
{changing \tracingmacros=0}
{into \tracingmacros=1}


{changing \aaa=undefined}
{into \aaa=macro:->something}

{changing \bbb=undefined}
{into \bbb=macro:->else \aaa , else}

\bbb ->else \aaa , else

\aaa ->something

{changing \ccc=undefined}
{into \ccc=macro:->third else something, else, le\ETC.}


{changing \tracingassigns=1}

Now, this explains the expansion steps done by (La)Tex quite well for this short example – unfortunately, it becomes extremely hard to read (for me) once you have to deal with possibly hundreds of these expansions, some maybe dealing with typesetting procedures.

So I was thinking – it shouldn't be too extremely difficult to build an application, which would basically read the logfile line by line, and allow for "stepping" through the logfile; I'd imagine rightarrow keyboard key -> would step you forward through the log, and leftarrow key <- would step backwards; possibly, one could specify line number of the logfile as a starting point as well.

Then, the application would simply react on '^{changing', '^{into', and possibly '^\\(.*)->(.*)'; and would display the line, as well as the "current" token elsewhere on screen; so at the "changing" line, the extra portion of the screen would say \aaa=undefined; and upon "into" line, the snippet would change to \aaa=macro:->something.

I think just this facility would make visualizing and understanding the (La)Tex expansion process much more easy (especially in "real" documents). And in fact, such an application doesn't even need a full-blown GUI – I'd imagine a ncurses terminal application would do just as well (problems with display of long strings in limited width terminal notwithstanding).

So, I was wondering – is there any application similar to this out there?

Best Answer

EDIT: This answer led to (in fact, is) a package: the unravel package, on GitHub. It relies on the gtl package, so if you want to help me by testing, you'll need to grab this as well. Both packages are written using the LaTeX3 programming language provided by the expl3 package (l3kernel) and some extensions in l3experimental.

I have now implemented most of TeX's primitives. The parts missing are math mode, tables (\halign etc.), discretionaries (including \-), the output routine, \aftergroup, \letterspacefont, \pdfcopyfont, \pdfprimitive, and all XeTeX and LuaTeX goodies. Unavoidably also, category codes are fixed when files are opened the first time, \outer macros will break the package, and begin-group and end-group characters other than left and right braces will cause trouble. Despite all those restrictions, \unravel{\documentclass{article}\relax} will show you all the nitty gritty details of what TeX does when going through article.cls (and before this, all the work that comes into deciding whether or not a file is worth reading). Beware: at full speed, this takes several minutes and 20000 steps.

As of now, the unravel package only provides a very simple interface to monitor TeX's activities while it is going through the expansion and typesetting process. One can only go forward. Let us give an example of use. Put the following code in a file, say filename.tex, and run pdflatex filename.tex in a terminal.

\documentclass{article}
\usepackage{unravel}
\AtEndDocument{\message{Bye!}}
\begin{document}
\end{document}

After a small welcome message, \unravel will wait for your input. Either go through steps one at a time, by pressing the key enter, or type s20o1 ("*s*croll 20 steps but still *o*utput") then enter. In the latter case, the output is (similar to, depending on the version) what follows.

======== Welcome to the unravel package ========
"<|" denotes the output to TeX's stomach.
"||" denotes tokens waiting to be used.
"|>" denotes tokens that we will act on.
Press <enter> to continue; 'h' <enter> for help.

|> \AtEndDocument {\message {Bye!}}
s20o1
[===== Step 1 =====] 
\AtEndDocument = macro:->\g@addto@macro \@enddocumenthook .
|> \g@addto@macro \@enddocumenthook {\message {Bye!}}
[===== Step 2 =====] 
\g@addto@macro = \long macro:#1#2->\begingroup \toks@ \expandafter {#1#2}\xdef
#1{\the \toks@ }\endgroup .
|> \begingroup \toks@ \expandafter {\@enddocumenthook \message {Bye!}}\xdef
|> \@enddocumenthook {\the \toks@ }\endgroup 
[===== Step 3 =====] \begingroup = \begingroup.
<| \begingroup 
|> \toks@ \expandafter {\@enddocumenthook \message {Bye!}}\xdef
|> \@enddocumenthook {\the \toks@ }\endgroup 
[===== Step 4 =====] \toks@ = \toks0.
<| \begingroup 
|| \toks@ 
|> \expandafter {\@enddocumenthook \message {Bye!}}\xdef \@enddocumenthook
|> {\the \toks@ }\endgroup 
[===== Step 5 =====] \expandafter {.
<| \begingroup 
|| \toks@ \expandafter {
|> \@enddocumenthook \message {Bye!}}\xdef \@enddocumenthook {\the \toks@
|> }\endgroup 
[===== Step 6 =====] \@enddocumenthook = macro:->.
<| \begingroup 
|| \toks@ \expandafter {
|> \message {Bye!}}\xdef \@enddocumenthook {\the \toks@ }\endgroup 
[===== Step 7 =====] back_input: \expandafter {.
<| \begingroup 
|| \toks@ 
|> {\message {Bye!}}\xdef \@enddocumenthook {\the \toks@ }\endgroup 
[===== Step 8 =====] Set \toks@(\toks0)=\message {Bye!}.
<| \begingroup 
|> \xdef \@enddocumenthook {\the \toks@ }\endgroup 
[===== Step 9 =====] \xdef .
<| \begingroup 
|| \xdef 
|> \@enddocumenthook {\the \toks@ }\endgroup 
[===== Step 10 =====] \@enddocumenthook .
<| \begingroup 
|| \xdef \@enddocumenthook 
|> {\the \toks@ }\endgroup 
[===== Step 11 =====] {.
<| \begingroup 
|| \xdef \@enddocumenthook {
|> \the \toks@ }\endgroup 
[===== Step 12 =====] \the .
<| \begingroup 
|| \xdef \@enddocumenthook {\the 
|> \toks@ }\endgroup 
[===== Step 13 =====] \toks@ .
<| \begingroup 
|| \xdef \@enddocumenthook {\the \toks@ 
|> }\endgroup 
[===== Step 14 =====] \message {Bye!}.
<| \begingroup 
|| \xdef \@enddocumenthook {\message {Bye!}
|> }\endgroup 
[===== Step 15 =====] }.
<| \begingroup 
|| \xdef \@enddocumenthook {\message {Bye!}}
|> \endgroup 
[===== Step 16 =====] Set \@enddocumenthook=macro:->\message {Bye!}.
<| \begingroup 
|> \endgroup 
[===== Step 17 =====] \endgroup = \endgroup.
<| \begingroup \endgroup 
|> 
<| \begingroup \endgroup 
|> 

Step 17 was the last!

Of course, the effect of the \AtEndDocument command takes place, and there is a message "Bye!" at the end of the compilation

This example did not involve any complicated expansion. For more fun in this direction, try to understand how the l3fp expression parsing works...

\documentclass{article}
\usepackage{unravel}
\begin{document}
\ExplSyntaxOn
\unravel { \fp_eval:n { sin(2pi/3) } }
\ExplSyntaxOff
\end{document}

After some steps, I get the following shown on my terminal:

[===== Step 3334 =====] \exp_after:wN \__fp_pack:NNNNNwn 
|| \exp_after:wN \__fp_to_decimal_dispatch:w 
|| \tex_romannumeral:D -48
|| \tex_romannumeral:D 
|| \exp_after:wN \__fp_parse_after:ww 
|| \tex_romannumeral:D -48
|| \exp_after:wN \__fp_parse_until_test:NwN 
|| \exp_after:wN \c_minus_one 
|| \tex_romannumeral:D -48
|| \exp_after:wN \__fp_fixed_mul_after:wn 
|| \int_use:N 
|| \__int_eval:w -50000
|| \exp_after:wN \__fp_pack:NNNNNwn 
|| \int_use:N 
|| \__int_eval:w 499950000+05235*05235
|| \exp_after:wN \__fp_pack:NNNNNwn 
|| \int_use:N 
|| \__int_eval:w 499950000+05235*9877+9877*05235
|| \exp_after:wN \__fp_pack:NNNNNwn 
|| \int_use:N 
|| \__int_eval:w 499950000+05235*5598+9877*9877+5598*05235
|| \exp_after:wN \__fp_pack:NNNNNwn 
|| \int_use:N 
|| \__int_eval:w 499950000+05235*2990+9877*5598+5598*9877+2990*05235
|| \exp_after:wN \__fp_pack:NNNNNwn 
|> \int_use:N \__int_eval:w \c__fp_trailing_shift_int
|> +9877*2990+5598*5598+2990*9877+(5598*2990+2990*5598+\__fp_fixed_mul:nnnnnnnw
n
|> {0000}{0000}{05235}{9877}{05235}{9877}{0000}{0000};{\exp_after:wN
|> \__fp_sin_series_aux:NNnww \exp_after:wN \__fp_fixed_to_float:wN
|> \__int_value:w \if_int_odd... (223 chars)

(I obtained this by pressing s3333 then enter twice). What does all this mean? Well, at some point during the expansion of \fp_eval:n { sin(2pi/3) }, TeX found \exp_after:wN (the LaTeX3 name for \expandafter), kept the next token, \__fp_to_decimal_dispatch:w for an expansion later, and expanded what follows. What follows was \tex_romannumeral:D (next token in the part of the screen marked with ||), which made TeX look for a number. After expanding some tokens further, TeX found the (incomplete) number -`0 (equal to -48) which made it expand further, and so on. The snapshot shown on the screen is taken at a time where TeX is 24 level deep in such nested expansions (all the tokens in the || part), and is going towards a 10th level since the last \exp_after:wN of the || part, after jumping over \__fp_fixed_mul_after:wn, hits \int_use:N (aka \the). Needless to say, macros to perform floating point computations are a bit hard to follow.

In general, there can be three parts: |> denote tokens on input, that TeX has not yet seen, or that have been reinserted for instance after a macro expansion; || denote tokens that are stored for later; <| denote commands that have reached TeX's main loop (i.e., have gone through the whole machinery of expansion) and have an impact on typesetting. Definitions are performed right away.

Implementing unravel was and is of course difficult, and I ended up having to very often rely on the TeX source code to find out how Knuth did various things. In particular, the nesting of conditionals is a nightmare. Those who know the source of TeX can probably recognise large traces of the influence in names such as \@@_scan_int: or \@@_get_x_next:. Also, each primitive is given a command code and a character code, whose values follow precisely those of pdfTeX.

One direction I would like to explore is to output all the data to an XML file (or other) which could then be processed through various other tools, perhaps giving more interactivity. Another interesting aspect would be to produce typeset content rather than on-screen diagnostic. This would allow to differentiate more visually, for instance, between the pieces of the || region, and it would allow to also differentiate category codes through color, which can be important in some debugging tasks. In the direction of producing less steps, I am wondering if giving a regular expression and performing silently expansions which pertain to commands matching with the regular expression would provide a useful filter.

Related Solutions

[Tex/LaTex] Why isn’t everything expandable

While a definitive answer can only come from the Stanford team involved in development of TeX, and from Professor Knuth in particular, I think we can see some possible reasons.

First, Knuth designed TeX primarily to solve a particular problem (typesetting The Art of Computer Programming). He made TeX sufficiently powerful to solve the typesetting problems he faced, plus the more general case he decided to address. However, he also kept TeX (almost) as simple as necessary to achieve this. While expandable macros are useful, they are not required to solve many issues.

Secondly, there are cases where an expandable approach would be at least potentially ambiguous. Bruno's \edef\foo{\def\foo{abc}} is a good case. I'd say that here the expected result with an expandable \def is that \foo expands to nothing, but I'd also say this is not totally clear. There is the much more common case where you want something like

\begingroup
\edef\x{%
 \endgroup
 \def\noexpand\foo{\csname some-macro-to-fully-expand\endcsname}%
 }
 \x

which would be made more complex with expandable primitives.

The above example points to another grey area: what would happen about things like \begingroup and more importantly \relax. The fact that the later is a non-expandable no-op is often important in TeX programming. (Indeed, the fact that \numexpr, etc., gobble an optional trailing \relax is sometimes regarded as a bad thing.)

Finally, I suspect that ease of implementation is important. The approach of having separate expansion and execution steps makes the flow relatively easy to understand, and I also suspect to implement. An approach which mixes expansion and execution requires a more complex architecture. Here, we have to remember when Knuth was writing TeX, and the fact that programming ideas which we take for granted today were not necessarily applicable in the late 1970s. A fully-expandable approach would I suspect have made the code more complex and slower. The speed impact is one that was important when TeX was running on 'big' computers.

[Tex/LaTex] Macro to do nothing via a \def

Short answer: You can't use assignments like \def or \edef inside expandable contexts like \edef because they are not expandable!

Long answer:
An \edef expands its content until only non-expandable tokens are left (note that text is not expandable). \def is an assignment and all assignments must be executed and are therefore not expandable. Your \DoNothingB actually does something: it defines \Temp!

Let me explain what happens on the following example:

\def\Temp{before}
% ...
\edef\NewTemp{\def\Temp{something}}

then the \edef tries to expand first \def which is not expandable and therefore stays at it is, then \temp which is expanded to before and then the rest: the tokens {, s, o, ..., g, and }, which are all not expandable either. Therefore you get \NewTemp as \def before{something} which will instruct TeX, once \NewTemp is expanded, i.e. used, to define b with the parameter text efore and the replacement text something. Because b is immutable (as long it is not declared an active character) you will get an error here.

In your example \Temp has been define to foo 4 in your fourth example, where you use \DoNothingB outside of an \edef. In the next usage inside of an \edef (\edef\NewTemp{\DoNothingB{foo 6}}) you get \def foo4{foo 6}foo4 for \NewTemp because both \Temp usage are expanded right then, which leads to the above mentioned error. The \def foo4{foo 6} is discarded as part of the error handling and then foo4 is typeset. If \Temp would be defined differently before the output would be accordantly.

A few \expandafters don't help you here because the order is not the issue here and \edef expands anyway everything continuously until only non-expandable tokens remain.

However, you might want to add some \noexpand direct before the \Temp macros inside \DoNothingB to protect them from premature expansion in the first \edef. This now doesn't work inside a \def. Normally you can use \protect with LaTeX's \protected@edef instead. Depending on the context \protect is either \relax (outside \protected@edef) or \@unexpandable@protect (inside \protected@edef) which is similar to \noexpand. For usage with \def you need to test this mode by yourself and add a \noexpand manually if inside edef.

The following code does this. Note that you need to replace all \edefs with \protected@edef for this to work. The normal \edef and other expanding contexts like \message or \write will not work.

\documentclass{article}
\usepackage{xcolor}% To highlight problem output

\newcommand*{\DoNothingA}[1]{#1}% Does everything I want in all cases

\makeatletter
\newcommand*{\DoNothingB}[1]{% Works mostly, except if used in a \edef
    \ifx\protect\@unexpandable@protect
        \def\noexpand\Temp{#1}%
    \else
        \def\Temp{#1}%
    \fi
    \protect\Temp%
}%
\makeatother

\begin{document}
% These two work great.
\DoNothingA{foo 1}
\DoNothingB{foo 2}
-- used output of macro directly

% These two also work great.
\def\NewTemp{\DoNothingA{foo 3}}
\NewTemp
\def\NewTemp{\DoNothingB{foo 4}}
\NewTemp
~-- used \verb|\def| to store output of macro


\edef\NewTemp{\DoNothingA{foo 5}}
\NewTemp
\color{red}% Highlight problem output
\makeatletter
\protected@edef\NewTemp{\DoNothingB{foo 6}}
\makeatother
\NewTemp 
\color{black}
~-- used \verb|\edef| to store output of macro
% Why is the last output "foo 4" (after ignoring error)??
\end{document}

Best Answer

Related Solutions

[Tex/LaTex] Why isn’t everything expandable

[Tex/LaTex] Macro to do nothing via a \def

Related Question