[Tex/LaTex] How to sequentially “feed” a LaTeX document from external data source

automationexternal files

I have a quite complex C program that has the ability to generat raw text output to text files that are then printed on a matrix printer. These lists are now meant to receive a typographical update to finally exist as a PDF that can be printed on a regular laser printer. For this I intend to use LaTeX. I can not exchange the C program, only modify the "printing/text output" functionality.

The general layout of the document to be generated is well-defined, but the number of rows of data is not.

So, in a first step I wrote a LaTeX "template" file, like this:

\documentclass{article}
\begin{document}
Dear --MRMRS-- --NAME--, we hope ...

And let my program replace (in the end with a sed 's/KEYWORD/programoutput/' command) the keywords --MRMRS--, --NAME-- by the data it produces in raw form. This works pretty well, to this point. As already mentioned, the length of some parts of the generated files is not clear beforehand. For example, later in the document there will be tables with a well-defined structure, but these tables can be different in length each time, so I can not simply define a given Number of --ROWxCOLy-- beforehand:

\begin{tabular}{cccc}
Col1 & Col2 & Col3 & Col4 \\
\hrule
% now how to fill the content sequentially without knowing the size beforehand?
\end{tabular}

Thank your, we hope to hear from you on --DATE-- ...
\end{document}

The data that comes looks currently like

Col1 Col2 Col3 Col4
Col1 Col2 Col3 Col4
Col1 Col2 Col3 Col4
COLSPANNED     LINE
Col1 Col2 Col3 Col4
Col1 Col2 Col3 Col4
... variable amount of lines

There can be special lines that need a full column span (span whole width)
but they come in well-defined order

The first solution that comes to mind is to hardcode LateX code directly in the data generating program, to have one keyword for the whole table, that is then generated with hardcoded LaTeX code in the program. But I'd like to avoid this as much as possible for obvious maintainance reasons (keep layout and logic separated as much as possible).

What other possible solutions are there to feed a LaTeX document with well-defined, but unknown length data?

Best Answer

Automatic processing of data for TeX / LaTeX

The problem in your question is the term "sequentially". It is impossible to generate one TeX/LaTeX document continuously. A TeX / LaTeX document that produces valid output has a begin and more important, an end. When the document is finished without errors, the output is complete and the TeX / LaTeX job is done. You cannot "feed" data into that document, once it is done.

What you can do is to build a LaTeX frame document, \input some_external_data and re-produce the output every time the external data changes.

The topic is: At the time the LaTeX job runs, the data you want to produce has to be defined in content and length, as the output document wont update automatically later, when the data changes, without running the LaTeX job again. At least not without quite complex, customized methods, that greatly depend on your used viewer or output medium.

To improve your original approach

It might help to change the way you process the data

Dear --MRMRS-- --NAME--, we hope ...

Here you use a self defined template language to produce a TeX file. I think that is, what you mean by "hard coded".

As its a good practice to separate output form (layout) and logic, it is also a good practice to separate generated data from the template as long as possible and to translate the set of information (the input or the data) in a way, the next processor understands (next processor is LaTeX in our case).

I show what I mean: LaTeX does not know what to do with --MRMRS-- and such constructs, thought the TeX machine is generally able to setup such a parser. But that would make things quite complex and hard to control and debug. So keep in the language LaTeX language domain, when you define your template:

Dear \MRMRS{} \NAME{}, we hope ...

Let's keep it simple and say that is our whole pattern, then the data set in text form might be

Mrs
Moneypenny

The C program might translate this input data into a form, that is known by the C language:

struct greeting {
    char* mrmrs;
    char* name;
};

Now the purpose of the C program (or whatever) is to translate that into the LaTeX language

\def\MRMRS{Mr}
\def\NAME{Moneypenny}

You can now read in the processed data into the LaTeX program, that is a true LaTeX program, not a template to-be-processed by whatever and every step from the raw data to the output document can be debugged separated from the other processes.