[Tex/LaTex] Generate a merged LaTeX file, with \input code in place

includeinput

I'm currently working on an experiment that involves re-typesetting a digitized physical book from OCRed images. For various reasons involving the workflow, the resulting project architecture is a "backbone" LaTeX document containing a series of \input{page0000.tex} lines, one for each page image of the original book.

It seems like it should be trivial, but I must not have stumbled on the correct keywords: I'd like to be able to generate a single, monolithic LaTeX file, where the \input{} code has been replaced by the file contents—but not replacing the \included files. In other words: something to stitch together the pages into a single run-on LaTeX file.

I could do it rather simply in Ruby, but I just have to think there's a pure TeX (cli?) solution.

Best Answer

I'd use cat. But since you asked for a TeX implementation, here you go.

\endlinechar=-1
\newread\in
\newwrite\out
\message{Please enter input file name: }
\read16to\inname
\openin\in=\inname \relax
\ifeof\in
        \immediate\write16{Failed to open \inname.}
        \expandafter\end
\fi
\message{Please enter output file name: }
\read16to\outname
\immediate\openout\out=\outname \relax
\begingroup
\catcode`@0
\catcode`(1
\catcode`)2
\catcode`\{12
\catcode`\}12
\catcode`I12
\catcode`N12
\catcode`P12
\catcode`U12
\catcode`T12
\catcode`\\12
@lowercase(
        @gdef@dosplitline#1\INPUT{#2}#3@splitsentinal(@def@ante(#1)@def@file(#2)@def@post(#3))
        @gdef@splitline(@expandafter@dosplitline@line\INPUT{@sentinal}@splitsentinal)
)
@endgroup
\def\splitpost{\expandafter\dosplitline\post\splitsentinal}
\def\sentinal{\sentinal}
\catcode`\%12
\def\processline{
        \ifx\file\sentinal
                \immediate\write\out{\ante}
                \let\temp\relax
        \else
                \immediate\write\out{\ante%}
                \let\temp\processline
                \copyfile
                \splitpost
                \ifx\empty\ante
                        \ifx\file\sentinal
                                \let\temp\relax
                        \fi
                \fi
        \fi
        \temp
}
\newread\f
\def\copyfile{
        \openin\f=\file\relax
        \ifeof\f
                \immediate\write16{Failed to open \file. Continuing.}
        \else
                \begingroup
                \loop
                        \readline\f to\line
                        \unless\ifeof\f
                        \immediate\write\out{\line}
                \repeat
                \endgroup
                \closein\f
        \fi
}

\loop
        \readline\in to\line
        \unless\ifeof\in
        \splitline
        \processline
\repeat
\closein\in
\immediate\closeout\out
\end

You need to use e-TeX (pdfTeX would work) to run this. It will ask you for the name of the master file and for the name of the output file:

$ etex merge
This is pdfTeX, Version 3.1415926-1.40.11 (TeX Live 2010)
 restricted \write18 enabled.
entering extended mode
(./merge.tex Please enter input file name: 
\inname=base
Please enter output file name: 
\outname=output
 )
No pages of output.
Transcript written on merge.log.

Here, I entered base and output to the queries and it read base.tex and produced output.tex.

It isn't perfect. Spaces after \input{foo} are lost, but you can replace \input{foo} bar with \input{foo}{} bar to keep them. Also, it assumes that % is always a comment, at least on \input lines.

Here's my one test example.

\documentclass{article}
\begin{document}
\input{a}

asdf \input{b}\input{c}{} \input{d}{}
\input{e}{} asdf
\end{document}

a.tex through e.tex consist of a single letter, A through E, respectively. Here's the output.

\documentclass{article}
\begin{document}
%
A

asdf %
B
%
C
{} %
D
{}
%
E
{} asdf
\end{document}

Note that \input replacement is not recursive, although it probably could be, at least up to depth 14 (which would hit the maximum number of TeX input streams—unless e-TeX supports more).

Finally, this is totally ridiculous. Don't use it. Use something meant for dealing with files instead.