[Tex/LaTex] Generate a merged LaTeX file, with \input code in place


I'm currently working on an experiment that involves re-typesetting a digitized physical book from OCRed images. For various reasons involving the workflow, the resulting project architecture is a "backbone" LaTeX document containing a series of \input{page0000.tex} lines, one for each page image of the original book.

It seems like it should be trivial, but I must not have stumbled on the correct keywords: I'd like to be able to generate a single, monolithic LaTeX file, where the \input{} code has been replaced by the file contents—but not replacing the \included files. In other words: something to stitch together the pages into a single run-on LaTeX file.

I could do it rather simply in Ruby, but I just have to think there's a pure TeX (cli?) solution.

Best Answer

I'd use cat. But since you asked for a TeX implementation, here you go.

\message{Please enter input file name: }
\openin\in=\inname \relax
        \immediate\write16{Failed to open \inname.}
\message{Please enter output file name: }
\immediate\openout\out=\outname \relax
                \immediate\write16{Failed to open \file. Continuing.}
                        \readline\f to\line

        \readline\in to\line

You need to use e-TeX (pdfTeX would work) to run this. It will ask you for the name of the master file and for the name of the output file:

$ etex merge
This is pdfTeX, Version 3.1415926-1.40.11 (TeX Live 2010)
 restricted \write18 enabled.
entering extended mode
(./merge.tex Please enter input file name: 
Please enter output file name: 
No pages of output.
Transcript written on merge.log.

Here, I entered base and output to the queries and it read base.tex and produced output.tex.

It isn't perfect. Spaces after \input{foo} are lost, but you can replace \input{foo} bar with \input{foo}{} bar to keep them. Also, it assumes that % is always a comment, at least on \input lines.

Here's my one test example.


asdf \input{b}\input{c}{} \input{d}{}
\input{e}{} asdf

a.tex through e.tex consist of a single letter, A through E, respectively. Here's the output.


asdf %
{} %
{} asdf

Note that \input replacement is not recursive, although it probably could be, at least up to depth 14 (which would hit the maximum number of TeX input streams—unless e-TeX supports more).

Finally, this is totally ridiculous. Don't use it. Use something meant for dealing with files instead.