Advice for merging book volumes of a series (each with its own toc and index) into a final single PDF

combinefiles

I'm early into a project where I will have five pdfs each of which will be two-sided, books that will include its own toc and index. (One is already complete).

The complete set of the five pdfs are considered to each be a Volume in a Series.

I'm actually fine with having five separate files.

However, I'm wondering if it's easy or painful to merge the volumes into a single file (pdf) when each volume needs to maintain its own book-level TOC at the start, and it's own book-level Index at the end of each volume.

Ideally, I'd like for each volume (book) to be considered as a single chapter in the merged over-all TOC. I.e. In the overall document, chapter one would be Volume I, chapter two would be Volume II, etc … and, again, each volume has it's own toc and index.

Can this be done outside LaTeX ? Inside LaTex ? Is it difficult enough that I should just keep the files separate ?

I'm asking now since I am early into structuring the project, and am basically looking for advice.

Thanks,

Best Answer

Combining/concatenating multiple volumes into a single .pdf-file is doable but, depending on your needs, not trivial.

For example, the questions arise:

  • Will there be cross-references and/or hyperlinks between different volumes?
  • Should all hyperlinks and bookmarks etc contained in the .pdf-files of single volumes be preserved in the .pdf-file containing the entire series?
  • Will each volume have its own title-pages, \frontmatter and \backmatter/\appendix?
  • How do you invoke TeX? Do you have installed and use a TeX-distribution like TeX Live or MiKTeX on your local machine so that all additional programs coming along with the TeX-distribution are accessible to you? Do you not use a local TeX installation but instead use an online-frontend like overleaf where additional programs belonging to the TeX-distribution installed on the server/in the docker-container are not accessible, and where you cannot easily have several .dvi-files/.pdf-files/TeX-output-files accessible in parallel as some script ensures that with a TeX-run the output-file created by TeX in any case has the name output.pdf?

In general, I see five possibilities:

  1. Creation of the LaTeX source code in such a way that volumes can be compiled directly as .pdf-files, either individually as single volumes or as a entire series.

    • E.g., a "mechanism" is feasible where compiling the same .tex-source yields different results, e.g., depending on the value provided on the command-line with the --jobname-option for determining the name of the .log-file and the resulting .pdf-file.
    • E.g., having the code of a .tex-file check if \documentclass was already issued and—if not—via \input loading a file containing a document-preamble is feasible. Approach 3 of my answer to Using newcommand in multiple subfiles with standalone provides an example of how this could be done.

    If you go for one of these routes, you need to do something about cross-references/hyperlinks leading from one single volume to another, e.g., using the package xr-hyper.

    In any case you need to be familiar with the pitfalls of LaTeX code. For example, if you use hyperref for having LaTeX create hyperlinks automatically, then you cannot simply reset the values ​​of counters (e.g., the page counter or the chapter counter at the start of a new volume of the series) between snippets of TeX-code for individual volumes, because then the names for linkable named destinations, which are derived from names and values of LaTeX-counters, are no longer unique, i.e., the hyperref-package might attempt to place several named destinations of same name into the pdf-file, which is not a good idea, and which causes pdfTeX-engines to trigger error-messages.

  2. Creating .pdf-files for individual volumes and using programs to merge/concatenate multiple .pdf-files into a single .pdf-file containing the entire series. This can be done e.g. with the program PDFtk. Or with the program pdfunite from the poppler-utils. Or with LaTeX, using the package pdfpages, which makes it possible to take over all pages of .pdf-files and/or to take over only certain pages of .pdf-files into the document to be generated during the LaTeX run currently taking place.

    As far as I know, however, these approaches have the disadvantage of disabling/removing hyperlinks in the volumes being combined. You also may loose bookmarks contained in .pdf-files of individual volumes. Pdf-form elements are also a problem with this approach, but I think that books containing fillable forms are rather rare/unlikely.

  3. Donald E. Knuth developed TeX when PDF (portable document format) did not yet exist. Back then output format of LaTeX was not PDF but was dvi—dvi here is an abbreviation for "device independent file format", not "digital visual interface" ;-). Before Hàn Thế Thành developed pdfTeX, it was common to have TeX create .dvi-files and to have other programs convert them to .pdf-files. This is still possible today. So one can still use "oldschool" LaTeX to generate .dvi-files instead of .pdf-files, then merge the .dvi-files of the individual volumes into one .dvi-file containing the entire series, and then convert the .dvi-file containing the entire series into a .pdf-file.

    With this approach it is possible to some extent, by a lot of trickery, to preserve hyperlinks within the single volumes, or even to make things so that hyperlinks occur in the .pdf-file containing the entire series for navigating back and forth between individual volumes.
    If the TeX installation on your computer is properly configured, it is possible to use pdfTeX-based engines while working on the individual volumes, and only in the last step, when the source texts for the individual volumes are ready, to work with traditional LaTeX and create .dvi-files for merging.

    Things that are only possible with TeX engines running in pdf-mode cannot be done with this approach because things need to work out also when compiling to .dvi-file, running TeX-engines in dvi-mode. Also, I have not yet managed to get such an approach to work with XeTeX-based unicode-capable engines: In XeTeX, the dvi format is replaced by the "extended dvi format", file extension .xdv, and I have not yet found software for combining .xdv-files in the same way in which dviconcat can be used for combining .dvi-files. Thus with this approach features of XeTeX are not available, too.

    In my answer to Cross-reference with xr package and final PDF combination? I elaborated on this approach.

    Although there are problems and restrictions with this approach, I encountered scenarios where I opted for it because of the possibility of combining—without loosing hyperlinks—several volumes whereof each has its own titlepages, \frontmatter and \backmatter/\appendix. There are documentclasses where you can divide things in \parts but definitely not all documentclasses let you have several instances of titlepage/\frontmatter/\backmatter/\appendix etc.

  4. Using the package dostrip you can maintain the .tex-source of your entire work within a set of .tex-files where tags can be used for denoting the volume where a portion of code shall belong to.
    docstrip's \generate-command can be used for extracting/copying from that set of .tex-files into a new .tex-file those portions of code that are needed for making up a specific volume or those portions of code that are needed for making up the entire series.
    The .tex-file generated by docstrip then can be compiled by running latex on it. If you do this, you need to do something about cross-references/hyperlinks leading from one single volume to another, e.g., using the package xr-hyper.

  5. Probably you can use LaTeX's \include..\includeonly-feature and either compile things while sources for all volumes are included for obtaining a .pdf-file that contains the entire series or compile things while only sources of a specific volume are included for obtainig a .pdf-file for that single volume only. If you do this, you need to do something about cross-references/hyperlinks leading from one single volume to another, but this will be tricky and there will be a chance for automatically created hyperlinks not coming from cross-referencing-commands being broken if the destination/target/anchor is in the pdf-file of another volume.


With all approaches, however, one will not get around coping with the internals of the LaTeX kernel, of the document class in use, of the hyperref package, and possibly of other packages in use.

For this you need to be familiar with the LaTeX internals, e.g., how mechanisms like \tableofcontents, \label..\ref, are implemented and how hyperref and the like packages change these mechanisms. To understand what is going on and to be able to adapt things to your own needs, you need to read (commented) source code of the LaTeX kernel and of LaTeX packages. This is tedious and error-prone. E.g., also because there are packages that overwrite code of other packages in case these other packages are loaded, too.

You also need to know about .pdf-files and, e.g., about the concept of "named destinations".

Sorry for not describing concrete procedures.

With the current state of information about your project describing a concrete procedure would not be easy.

Since such a project requires the adaptation of code, one must know exactly which TeX-engine and which code (documentclass, LaTeX packages etc.) shall be in use. One also needs to know about the features which the .pdf-files forming the "final products" shall provide. And about TeX-related programs besides TeX/LaTeX/pdfLaTeX available to you in your workflow. Probably the state of the \write18-feature/the features provided by the package shellesc might be of interest, too. (\write18/shellesc is for starting other programs from within TeX/LaTeX; since this can be considered a security-risk, \write18/shellesc nowadays usually is restricted or disabled by default.)

Knowledge about documentclass and packages etc is needed, e.g., for creating and fine-tuning a mechanism for combining and adjusting .toc-files for having an overall-table of contents. With most documentclasses LaTeX creates an auxiliary text file of extension .toc which contains data for the table of contents. Same for the list of tables (extension: .lot) and the list of figures (extension: .lof).

Related Question