[Tex/LaTex] What exactly happens when LaTeX runs on a dtx file

dtx

I am confused on how docstrip/ltxdoc really runs. It seems that LaTeX runs twice when called on such a simple dtx file.

% \iffalse
%<*installation>
\expandafter\begingroup
\input docstrip.tex
\keepsilent
\askforoverwritefalse
\nopreamble
\nopostamble
\generate{\file{\jobname.cls}{\from{\jobname.dtx}{latex}}}
\expandafter\endgroup
%</installation>
%<*documentation>
\ProvidesFile{\jobname.dtx}
\documentclass{ltxdoc}
\begin{document}
  \DocInput{\jobname.dtx}
\end{document}
%</documentation>
%<*latex>
% \fi
%    \begin{macrocode}
\newcommand{\mystuff}{mystuff}
%    \end{macrocode}
% \iffalse
%</latex> 
% \fi

I understand the first pass easily: docstrip is read and produces the .cls file as asked by the \generate command (the installation block).

But then a second pass seems to be done with the first group (from \begingroup to \endgroup) being ignored and the rest of the file (the documentation block) being used as normal LaTeX code thus producing the expected .pdf (or .dvi file).

If the \expandafter\begingroup and \expandafter\endgroup lines are removed only the first pass is done and an error is raised.

I read the docstrip documentation (and code but certainly not seriously enough) but did not find any allusion to this double execution.

So I am looking for answers to the following sub-questions:

  • Am I right (two passes)?
  • Did I miss something in the docstrip documentation (if yes where)?
  • Is the code read twice or once?
  • Why do we need the first group to exist in order for LaTeX to produce the two expected files?

Best Answer

The best way to understand what happens when TeX reads a .dtx file to to work through slowly and track the lines. I'll use your example with a few minor modifications (to fix what seem to be mistakes and to add some useful illustration.) I'm going to assume we are running latex <myfile>.dtx as this particular one is 'self extracting' under such circumstances. I'm also going to group the lines for reasons that will become clear. While as we'll see there is only one run, there are multiple passes over the same file: that's going to get complicated so I'll try to take it in steps.

'Main' pass

We start with the line

% \iffalse

which is a comment for the time being but will be important later.

Next we get

%<*installation>
\begingroup
\input docstrip.tex
\keepsilent
\askforoverwritefalse
\nopreamble
\nopostamble
\generate{\file{\jobname.cls}{\from{\jobname.dtx}{latex}}}
\endgroup
%</installation>

At this stage, the 'guard' lines %<*installation> and %</installation> are comments and are ignored. TeX therefore opens a group and then inputs the program DocStrip. This is done in a group so the DocStrip stuff doesn't affect what comes later.

DocStrip then does some set up before getting to the \generate line. This is an instruction to read \jobname.dtx (the current file) again but do some other processing to create \jobname.cls. That happens now, but to understand the process for the moment I'll treat it as a black box that has created \jobname.cls, and come back to in in the section on the DocStrip pass.

Once the group closes we are back to normal in terms of command meaning. Moving one we have

%<*documentation>
\ProvidesFile{\jobname.dtx}
\documentclass{ltxdoc}
\begin{document}
  \DocInput{\jobname.dtx}
\end{document}
%</documentation>

This is the so-called 'driver' part of the .dtx (it used to be provided as a .drv file many years ago). This part looks like a normal LaTeX file apart from the \DocInput line. What happens here is that LaTeX sets up for the normal typesetting then is going to read \jobname.dtx yet again in a typesetting run. Again, this is actually happening 'here' but I'll talk about it separately.

As we've now passed \end{document}, the lines

%<*latex>
% \fi
% Some text about this macro.
%    \begin{macrocode}
\newcommand{\mystuff}{mystuff}
%    \end{macrocode}
% \iffalse
%</latex> 
% \fi

never get read as part of the 'main' pass, so we don't worry about them at the moment.

The net result is a class file and a PDF: they come from the two secondary 'passes' I'll describe below.

DocStrip pass

DocStrip reads files looking for guards, then copies any non-comment lines into whatever file it's been asked to. In our DocStrip run we are creating a file \jobname.cls using the guard name latex. Let's see how this pass reads the .dtx.

We once again start with a comment line which isn't a guard, so this gets ignored completely

% \iffalse

The next line is a guard (it starts %<): if you read up on DocStrip you'll see %<* means that this guard applies until a matching </. The guard name here is installation: that's not the one we are after, so all of these lines are skipped. (Note that guard lines are special comments, so DocStrip doesn't exactly totally ignore comment lines.)

%<*installation>
\begingroup
\input docstrip.tex
\keepsilent
\askforoverwritefalse
\nopreamble
\nopostamble
\generate{\file{\jobname.cls}{\from{\jobname.dtx}{latex}}}
\endgroup
%</installation>

We now find a second guard block, this time called documentation. That's still not what we want so again it gets ignored.

%<*documentation>
\ProvidesFile{\jobname.dtx}
\documentclass{ltxdoc}
\begin{document}
  \DocInput{\jobname.dtx}
\end{document}
%</documentation>

Finally we find the guard we are after. Any non-comment lines here are copied to \jobname.cls: that means just the one reading \newcommand{\mystuff}{mystuff}. You might be wondering about the comment lines here: they will come to the fore in the typesetting part.

%<*latex>
% \fi
% Some text about this macro.
%    \begin{macrocode}
\newcommand{\mystuff}{mystuff}
%    \end{macrocode}
% \iffalse
%</latex> 
% \fi

Typesetting pass

During the operation of the line \DocInput{\jobname.dtx} we are looking for the document body to typeset. LaTeX is reading our file again, but this time crucially a % in the first column is not a comment, it's completely ignored. So all of the lines need to be read ignoring this first % if present.

Under this regime, the first line

% \iffalse

isn't a comment, it's an \iffalse conditional. That means 'skip everything until a matching \fi, so the lines

%<*installation>
\expandafter\begingroup
\input docstrip.tex
\keepsilent
\askforoverwritefalse
\nopreamble
\nopostamble
\generate{\file{\jobname.cls}{\from{\jobname.dtx}{latex}}}
\expandafter\endgroup
%</installation>
%<*documentation>
\ProvidesFile{\jobname.dtx}
\documentclass{ltxdoc}
\begin{document}
  \DocInput{\jobname.dtx}
\end{document}
%</documentation>
%<*latex>

are all skipped up to the matching

% \fi

line. The result is none of that extracting stuff or the driver will be typeset.

We then get

% Some text about this macro.
%    \begin{macrocode}
\newcommand{\mystuff}{mystuff}
%    \end{macrocode}

Again, % isn't a comment here so we get the macrocode environment provided by the ltxdoc class. That typesets the content (the code line) as code and does things like indexing it. I've also added a line of normal text here: this will be typeset just before the macrocode environment.

Finally, we get

% \iffalse
%</latex> 
% \fi

The guard here would normally be typeset, but the author has skipped that by surrounding it with an \iffalse conditional. (I would probably typeset the guards, so I'd place them inside a macrocode environment.)

Conclusion

There is only one TeX run here but the same source is read three times, so it's easiest to think of it as three passes. If you do that you can work out what is happening relatively easily. (Perhaps worth noting is that a 'classical' .dtx file would have the part for installation as a separate .ins file, without the need for a group, and as I've noted very old DTX files also have the driver/typesetting part as a separate file.)

Related Question