Disclaimer: this question is not related to the standalone package.
When writing an article, I split the content between several files, to ease the versionning notably.
At the time of writing, most of my articles looks like the following:
my-split-file.tex:
\documentclass{article}
\input{packages}
\input{macros}
\addbibresource{short,biblio}
\begin{document}
\input{content}
\end{document}
The file content.tex includes itself several files, included with the \input command.
When the writing is over, I prefer to keep a single file, that I can easily edit, share and backup.
Thanks to bibtool, I can easily extract from my (huge) .bib files the relevant content :
bibtool -x my-split-file.aux -o standalone.bib
After that, I can include the relevant part of the .bib file in my .tex thanks to the filecontents package.
Is there any way to do the same for the .tex files ?
I'd like to produce a document that:
- Replace the input command by the actual content of the document included ;
- Extract the macros used and leave aside the others ;
-
(Bonus) remove the comment in a “wise" manner: not the one that helps to go to a new line without adding a space (as for
I start a sentence% \footnote{I can start my footnote on a new line, for readability of the source code} and then go back to my sentence.
) but the one that says “this proof is probably wrong'' 😉 edit : it should probably also remove empty lines created by the removal of the comments…
I am not afraid by some linux scripts!
edit : since scripting seems the best way to handle that, I'd like the script to automatically run bibtool, extract the relevant part of the bibliography and puts it in a filecontents environment.
edit : correct me if I'm wrong, but I know 5 methods to define (what I call) a macro :
\newcommand
\newcommand*
\renewcommand
\renewcommand*
\def
However, my macros.tex files are always cluttered with \hyphenation
, \DefineBibliographyStrings
and other biblatex options, \DeclareDocumentCommand
, \NewDocumentCommand
(from the xparse package), \DeclareMathOperator
, \DeclareMathSymbol
, etc.
Probably the method proposed by Pouya should be "reversed" and remove the unused macros (defined with one of those methods), but leave the rest as it is (whereas Pouya just keep the used macro if I understood correctly).
Best Answer
Disclaimer!
Use this at your own risk. And back up EVERYTHING! Moreover, consider this as a starting point, since it is far from being perfect.
I have written a small bash script that can do all of your three requisites in a fairly general fashion, however, it is better if you personalize is based on your project structure.
It should be compatible with any standard Unix shells (including OS X). Apart from that, you need latexpand that will take care of inlining your files.
Here I will explain how each part of the script work and how it manages to do what it does. Although I would suggest to use its part separately (instead of running it as a whole). Perhaps, it gives you a better control of what you wanna do.
It needs two files to start:
The file that contains all your macros and the main
tex
file.TRIMMED_MACRO_FILE
is a temporary file that holds the list of used macros. It then checks if this temp file, as well as two other text files exist and if negative, it continues (note that script will delete these auxiliary files once it is done).It first solves your second problem! It searches in your macro file using this regular expression
(?<=\\def\\)(.+?)(?=\{)
, and collects the name of all your macros. In my example I assumed macros are in the form of\def\name{...
, however, however, if you are using other macro-definition commands here are some regular expressions:(?<=\\newcommand\{\\)(.+?)(?=\})
for newcommand(?<=\\renewcommand\{\\)(.+?)(?=\})
for renewcommand(?<=\\newcommand\*\{\\)(.+?)(?=\})
for newcommand*You can use the logical operator
or
(|
) in your regex to have multiple of aforementioned definitions, e.g.uses both
def
andrenewcommand
syntaxes.It then stores all of macro names in
macros_regex.txt
in the following form:Then using the next line, it checks which of these macros has been used:
This is what happens:
grep -Porh
means searching in file contents using perl-regex, printing only matching lines, recursive and omit file names. It also excludes your original macro file because obviously it will match with all patterns. Finally we provide the pattern that we created previously by$(cat ./macros_regex.txt)
and we search recursively in alltex
files.The results are then sorted and the duplicates are removed ( pipe to
sort
anduniq
respectively). Then again, we create a regex of this output in the form ofBut this time it only contains the used macros. Finally,
grep
this file with original macro files and save the data inTRIMMED_MACRO_FILE
. To summarize this, if we have a file in the form of:after this stage, we have:
that are definition of macros that has been used in your project.
Now, here I explain why I solved your second problem first
:D
. The idea is that once we have the trimmed version of the file, swap the original and the trimmed macro file and then expand/inline everything. This is done by backing up the original macro file, renaming the trimmed one and finally usinglatexpand
to inline everything.latexpand
is a perl program that takes care of\input
and\include
. As you see I have used the--keep-comments
flag to preserve the comments. If you don't do so, it will nicely cleans all the comments, however, this cleaning includes the comments that you have mentioned are needed to be kept. Cleaning the comments is a simplesed
one-liner that replaces\%[^\n].+
pattern with blank. That regex means, a percent sign that is not directly followed by a newline but by 1 or more character of any kind. Finally, if you want to remove blank lines, you can use the last command, i.e.sed -ie '/^$/ d' inlined_paper.tex
or comment it otherwise.As you see this is a script that can do the job but it should be customized based on your project structure and commands. Again, I would suggest to use different parts of this code separately instead of running it as whole. For instance, the line that removes the comments is a useful one liner stand-alone.
Finally, I suggest to stick to
latexpand
as it is a professional tool that is designed for this purpose, instead of this script that I created because my other codes were not compiling (this says a lot!) and I was bored.P.S. I assumed the reader has a fair familiarity with basic bash commands such as
cp
,mv
andgrep
. If you find this answer not verbose enough, please leave a comment.