Have a look at biber which in the current 1.5 dev version on SourceForge has a new "tool" mode which allows you to use biber's reencoding and source mapping features independently of biblatex. The source mapping features are what you mainly need from your description and this is all documented in the PDF manual. I can provide specific examples if you have specific questions. biber will do everything you mention above apart from the @string expansion which would be possible to add but as you say, it's fairly idiosyncratic.
Of course, you can do this dynamically with biber too - with the changes being applied as the .bib is read but the .bib is not touched. The new tool mode allows you to write the changed .bib to another file without writing a .bbl.
For example, here is how in tool mode to tackle points 2, 3 5 and 6 in your examples. Point 1 is better handled semantically with biblatex and its max/min names options. Create a biber.conf with:
<config>
<sourcemap>
<maps datatype="bibtex" map_overwrite="1">
<map>
<map_step map_field_set="issn" map_null="1"/>
</map>
<map>
<per_type>ARTICLE</per_type>
<map_step map_field_set="title" map_null="1"/>
</map>
<map>
<per_type>ARTICLE</per_type>
<map_step map_field_source="pages" map_final="1"/>
<map_step map_field_set="archiveprefix" map_null="1"/>
<map_step map_field_set="eprint" map_null="1"/>
<map_step map_field_set="primaryclss" map_null="1"/>
</map>
</maps>
<map>
<map_step map_field_source="doi" map_match="[\\;]" map_final="1"/>
<map_step map_field_set="doi" map_null="1"/>
</map>
</sourcemap>
</config>
Then run biber with
biber --tool file.bib
Which will look in the default locations for your biber.conf
and will output a file called file_bibertool.bib
.
This is also all possible, as I said, dynamically using the biber.conf as you process the file normally into a .bbl with biber and also the whole mapping functionality is available in biblatex through macros (see \DeclareSourcemap in the biblatex documentation) if you wanted to do this on a per-document basis dynamically.
latexmk
is the answer you are looking for.
LaTeX is notoriously difficult to "get right" using a Makefile, because it might take multiple compiler passes - updating e.g. .aux
files - to get the finished results. Getting this right in a general Makefile (as opposed to one tailored to a specific document) is very hard, which is why there are pre-made solutions. Of these, latexmk
comes included with your average LaTeX distribution, which is why I consider it first choice.
The trick is to provide a Makefile rule for every custom step x-to-TeX (or x-to-PDF or whatever) you might have, and having latexmk
figure out all the LaTeX-related stuff while relying on the Makefile for the rest (via -use-make
).
# You want latexmk to *always* run, because make does not have all the info.
# Also, include non-file targets in .PHONY so they are run regardless of any
# file of the given name existing.
.PHONY: MyDoc.pdf all clean
# The first rule in a Makefile is the one executed by default ("make"). It
# should always be the "all" rule, so that "make" and "make all" are identical.
all: MyDoc.pdf
# CUSTOM BUILD RULES
# In case you didn't know, '$@' is a variable holding the name of the target,
# and '$<' is a variable holding the (first) dependency of a rule.
# "raw2tex" and "dat2tex" are just placeholders for whatever custom steps
# you might have.
%.tex: %.raw
./raw2tex $< > $@
%.tex: %.dat
./dat2tex $< > $@
# MAIN LATEXMK RULE
# -pdf tells latexmk to generate PDF directly (instead of DVI).
# -pdflatex="" tells latexmk to call a specific backend with specific options.
# -use-make tells latexmk to call make for generating missing files.
# -interaction=nonstopmode keeps the pdflatex backend from stopping at a
# missing file reference and interactively asking you for an alternative.
MyDoc.pdf: MyDoc.tex
latexmk -pdf -pdflatex="pdflatex -interaction=nonstopmode" -use-make MyDoc.tex
clean:
latexmk -CA
This setup works flawlessly for anything referenced via \include
.
However, \include
might not be appropriate in every case. For one, it is not nestable (i.e. an \include
d file may not \include
another). It also adds an automatic \clearpage
to your document, i.e. \include
d content starts a new page. It also has advantages, like resulting in shorter re-build times if contents are modified, but sometimes you need nesting, or the referenced file's contents should be embedded in a page.
You need \input
for this.
Sadly, \input
breaks the build. If pdflatex
encounters a missing \input
file, it generates an error (instead of a warning like with \include
), and stops compiling. Yes, latexmk
will generate the file and re-start pdflatex
, but this is inefficient, and breaks completely if you have multiple such file references, because eventually the compile will end with a "too many re-runs" message.
John Collins' answer to a question by me regarding this problem provides a workaround:
\newcommand\inputfile[1]{%
\InputIfFileExists{#1}{}{\typeout{No file #1.}}%
}
This macro generates a warning instead of the error of a straight \input
, and allows latexmk
to generate all missing files in the first pass.
Note: A rule with the generic target %.pdf: %.tex
gives you trouble once you start using \includeonly
in your document, for reasons internal and complex. That's why I used a specific rule instead of a generic one.
There is actually one alternative to latexmk
that I can also recommend. In case you are looking at a more involved project setup, you might consider CMake, for which Kenneth Moreland has done the excellent UseLATEX.cmake module.
This, however, is a bit too involved to give a how-to in the scope of this answer.
Best Answer
Latexmk is one possibility, although I've never used it myself.