[Tex/LaTex] Make a “standalone” version of multiple files: all document in one .tex file

best practicesbibtoolfilecontentsincludescripts

Disclaimer: this question is not related to the standalone package.

When writing an article, I split the content between several files, to ease the versionning notably.
At the time of writing, most of my articles looks like the following:

my-split-file.tex:

\documentclass{article}
\input{packages}
\input{macros}
\addbibresource{short,biblio}

\begin{document}
    \input{content}
\end{document}

The file content.tex includes itself several files, included with the \input command.

When the writing is over, I prefer to keep a single file, that I can easily edit, share and backup.

Thanks to bibtool, I can easily extract from my (huge) .bib files the relevant content :

bibtool -x my-split-file.aux -o standalone.bib

After that, I can include the relevant part of the .bib file in my .tex thanks to the filecontents package.

Is there any way to do the same for the .tex files ?
I'd like to produce a document that:

  • Replace the input command by the actual content of the document included ;
  • Extract the macros used and leave aside the others ;
  • (Bonus) remove the comment in a “wise" manner: not the one that helps to go to a new line without adding a space (as for

    I start a sentence%
    \footnote{I can start my footnote on a new line, for readability of the source code}
    and then go back to my sentence.
    

    ) but the one that says “this proof is probably wrong'' 😉 edit : it should probably also remove empty lines created by the removal of the comments…

I am not afraid by some linux scripts!

edit : since scripting seems the best way to handle that, I'd like the script to automatically run bibtool, extract the relevant part of the bibliography and puts it in a filecontents environment.

edit : correct me if I'm wrong, but I know 5 methods to define (what I call) a macro :

  • \newcommand
  • \newcommand*
  • \renewcommand
  • \renewcommand*
  • \def

However, my macros.tex files are always cluttered with \hyphenation, \DefineBibliographyStrings and other biblatex options, \DeclareDocumentCommand, \NewDocumentCommand (from the xparse package), \DeclareMathOperator, \DeclareMathSymbol, etc.

Probably the method proposed by Pouya should be "reversed" and remove the unused macros (defined with one of those methods), but leave the rest as it is (whereas Pouya just keep the used macro if I understood correctly).

Best Answer

Disclaimer!

Use this at your own risk. And back up EVERYTHING! Moreover, consider this as a starting point, since it is far from being perfect.


I have written a small bash script that can do all of your three requisites in a fairly general fashion, however, it is better if you personalize is based on your project structure.

It should be compatible with any standard Unix shells (including OS X). Apart from that, you need latexpand that will take care of inlining your files.

Here I will explain how each part of the script work and how it manages to do what it does. Although I would suggest to use its part separately (instead of running it as a whole). Perhaps, it gives you a better control of what you wanna do.

#!/bin/bash

# Defining some variable for you files:
# 1. The file containing all your macros
# 2. The file that the script uses to contain trimmed (i.e. only used) macros
# 3. Main .tex file
ORIGINAL_MACRO_FILE=macros_original.tex
TRIMMED_MACRO_FILE=macros_original_trimmed.tex
MAIN_FILE=paper.tex

if [[ -f macros_regex.txt || -f macros_regex_trimmed.txt || -f $TRIMMED_MACRO_FILE ]]; then
    echo "Some temp files are already here. Please get rid of them first..."
    exit 0
fi

# Making a trimmed version of macro files:
echo -ne '\\b(' > macros_regex.txt
echo -ne '\\b(' > macros_regex_trimmed.txt
grep -Po '(?<=\\def\\)(.+?)(?=\{)' $ORIGINAL_MACRO_FILE | tr '\n' '\|' >> macros_regex.txt
sed -i 's/.$/\)\\b/' macros_regex.txt
grep -Porh --exclude=$ORIGINAL_MACRO_FILE $(cat ./macros_regex.txt) *.tex | sort | uniq | tr '\n' '\|' >> macros_regex_trimmed.txt
sed -i 's/.$/\)\\b/' macros_regex_trimmed.txt
grep -P $(cat ./macros_regex_trimmed.txt) $ORIGINAL_MACRO_FILE > $TRIMMED_MACRO_FILE

# Backing up the original macro file:
cp $ORIGINAL_MACRO_FILE $ORIGINAL_MACRO_FILE\_backup
mv $TRIMMED_MACRO_FILE $ORIGINAL_MACRO_FILE

# Inline the file to collect have one big file. This is needed for finding unused macros
perl latexpand --keep-comments $MAIN_FILE > inlined_paper.tex 
rm macros_regex.txt macros_regex_trimmed.txt

# Putting back the original macro files:
rm $ORIGINAL_MACRO_FILE
mv $ORIGINAL_MACRO_FILE\_backup $ORIGINAL_MACRO_FILE

# Removing comments
sed -ri '/\%[^\n].+/ d' inlined_paper.tex

# Removing blank lines if you want
sed -ie '/^$/ d' inlined_paper.tex

It needs two files to start:

ORIGINAL_MACRO_FILE=macros_original.tex
MAIN_FILE=paper.tex

The file that contains all your macros and the main tex file. TRIMMED_MACRO_FILE is a temporary file that holds the list of used macros. It then checks if this temp file, as well as two other text files exist and if negative, it continues (note that script will delete these auxiliary files once it is done).

It first solves your second problem! It searches in your macro file using this regular expression (?<=\\def\\)(.+?)(?=\{), and collects the name of all your macros. In my example I assumed macros are in the form of \def\name{..., however, however, if you are using other macro-definition commands here are some regular expressions:

  • (?<=\\newcommand\{\\)(.+?)(?=\}) for newcommand
  • (?<=\\renewcommand\{\\)(.+?)(?=\}) for renewcommand
  • (?<=\\newcommand\*\{\\)(.+?)(?=\}) for newcommand*

You can use the logical operator or (|) in your regex to have multiple of aforementioned definitions, e.g.

((?<=\\def\\)|(?<=\\renewcommand\{\\))(.+?)(?=\{)

uses both def and renewcommand syntaxes.

It then stores all of macro names in macros_regex.txt in the following form:

\b(amacro|anothermacro|foo|bar|etc)\b

Then using the next line, it checks which of these macros has been used:

grep -Porh --exclude=$ORIGINAL_MACRO_FILE $(cat ./macros_regex.txt) *.tex | sort | uniq | tr '\n' '\|' >> macros_regex_trimmed.txt

This is what happens: grep -Porh means searching in file contents using perl-regex, printing only matching lines, recursive and omit file names. It also excludes your original macro file because obviously it will match with all patterns. Finally we provide the pattern that we created previously by $(cat ./macros_regex.txt) and we search recursively in all tex files.

The results are then sorted and the duplicates are removed ( pipe to sort and uniq respectively). Then again, we create a regex of this output in the form of

\b(anothermacro|foo|etc)\b

But this time it only contains the used macros. Finally, grep this file with original macro files and save the data in TRIMMED_MACRO_FILE. To summarize this, if we have a file in the form of:

% original macros
\def\bfa{{\mbox{\boldmath $a$}}}
\def\bfb{{\mbox{\boldmath $b$}}}
\def\bfc{{\mbox{\boldmath $c$}}}
\def\bfd{{\mbox{\boldmath $d$}}}
\def\bfe{{\mbox{\boldmath $e$}}}
\def\bff{{\mbox{\boldmath $f$}}}

after this stage, we have:

% trimmed macro file
\def\bfb{{\mbox{\boldmath $b$}}}
\def\bfc{{\mbox{\boldmath $c$}}}
\def\bff{{\mbox{\boldmath $f$}}}

that are definition of macros that has been used in your project.

Now, here I explain why I solved your second problem first :D. The idea is that once we have the trimmed version of the file, swap the original and the trimmed macro file and then expand/inline everything. This is done by backing up the original macro file, renaming the trimmed one and finally using latexpand to inline everything.

perl latexpand --keep-comments $MAIN_FILE > inlined_paper.tex

latexpand is a perl program that takes care of \input and \include. As you see I have used the --keep-comments flag to preserve the comments. If you don't do so, it will nicely cleans all the comments, however, this cleaning includes the comments that you have mentioned are needed to be kept. Cleaning the comments is a simple sed one-liner that replaces \%[^\n].+ pattern with blank. That regex means, a percent sign that is not directly followed by a newline but by 1 or more character of any kind. Finally, if you want to remove blank lines, you can use the last command, i.e. sed -ie '/^$/ d' inlined_paper.tex or comment it otherwise.


As you see this is a script that can do the job but it should be customized based on your project structure and commands. Again, I would suggest to use different parts of this code separately instead of running it as whole. For instance, the line that removes the comments is a useful one liner stand-alone.

Finally, I suggest to stick to latexpand as it is a professional tool that is designed for this purpose, instead of this script that I created because my other codes were not compiling (this says a lot!) and I was bored.

P.S. I assumed the reader has a fair familiarity with basic bash commands such as cp, mv and grep. If you find this answer not verbose enough, please leave a comment.