[Tex/LaTex] How to sequentially “feed” a LaTeX document from external data source

automationexternal files

I have a quite complex C program that has the ability to generat raw text output to text files that are then printed on a matrix printer. These lists are now meant to receive a typographical update to finally exist as a PDF that can be printed on a regular laser printer. For this I intend to use LaTeX. I can not exchange the C program, only modify the "printing/text output" functionality.

The general layout of the document to be generated is well-defined, but the number of rows of data is not.

So, in a first step I wrote a LaTeX "template" file, like this:

\documentclass{article}
\begin{document}
Dear --MRMRS-- --NAME--, we hope ...

And let my program replace (in the end with a sed 's/KEYWORD/programoutput/' command) the keywords --MRMRS--, --NAME-- by the data it produces in raw form. This works pretty well, to this point. As already mentioned, the length of some parts of the generated files is not clear beforehand. For example, later in the document there will be tables with a well-defined structure, but these tables can be different in length each time, so I can not simply define a given Number of --ROWxCOLy-- beforehand:

\begin{tabular}{cccc}
Col1 & Col2 & Col3 & Col4 \\
\hrule
% now how to fill the content sequentially without knowing the size beforehand?
\end{tabular}

Thank your, we hope to hear from you on --DATE-- ...
\end{document}

The data that comes looks currently like

Col1 Col2 Col3 Col4
Col1 Col2 Col3 Col4
Col1 Col2 Col3 Col4
COLSPANNED     LINE
Col1 Col2 Col3 Col4
Col1 Col2 Col3 Col4
... variable amount of lines

There can be special lines that need a full column span (span whole width)
but they come in well-defined order

The first solution that comes to mind is to hardcode LateX code directly in the data generating program, to have one keyword for the whole table, that is then generated with hardcoded LaTeX code in the program. But I'd like to avoid this as much as possible for obvious maintainance reasons (keep layout and logic separated as much as possible).

What other possible solutions are there to feed a LaTeX document with well-defined, but unknown length data?

Best Answer

Automatic processing of data for TeX / LaTeX

The problem in your question is the term "sequentially". It is impossible to generate one TeX/LaTeX document continuously. A TeX / LaTeX document that produces valid output has a begin and more important, an end. When the document is finished without errors, the output is complete and the TeX / LaTeX job is done. You cannot "feed" data into that document, once it is done.

What you can do is to build a LaTeX frame document, \input some_external_data and re-produce the output every time the external data changes.

The topic is: At the time the LaTeX job runs, the data you want to produce has to be defined in content and length, as the output document wont update automatically later, when the data changes, without running the LaTeX job again. At least not without quite complex, customized methods, that greatly depend on your used viewer or output medium.

To improve your original approach

It might help to change the way you process the data

Dear --MRMRS-- --NAME--, we hope ...

Here you use a self defined template language to produce a TeX file. I think that is, what you mean by "hard coded".

As its a good practice to separate output form (layout) and logic, it is also a good practice to separate generated data from the template as long as possible and to translate the set of information (the input or the data) in a way, the next processor understands (next processor is LaTeX in our case).

I show what I mean: LaTeX does not know what to do with --MRMRS-- and such constructs, thought the TeX machine is generally able to setup such a parser. But that would make things quite complex and hard to control and debug. So keep in the language LaTeX language domain, when you define your template:

Dear \MRMRS{} \NAME{}, we hope ...

Let's keep it simple and say that is our whole pattern, then the data set in text form might be

Mrs
Moneypenny

The C program might translate this input data into a form, that is known by the C language:

struct greeting {
    char* mrmrs;
    char* name;
};

Now the purpose of the C program (or whatever) is to translate that into the LaTeX language

\def\MRMRS{Mr}
\def\NAME{Moneypenny}

You can now read in the processed data into the LaTeX program, that is a true LaTeX program, not a template to-be-processed by whatever and every step from the raw data to the output document can be debugged separated from the other processes.

Related Solutions

[Tex/LaTex] Put from external file

This may meet your needs. I found it at File input and output; I'm reproducing it (somewhat simplified) here:

With this sample fileOut.txt:

first line, with a \TeX{} macro to expand
second line
third line

the code

\documentclass{article}

\newread\file
\openin\file=fileOut.txt

\newcommand{\getnextline}{%
 \read\file to\fileline % Reads a line of the file into \fileline
 \fileline % display it
}

\begin{document}

First line of fileOut.txt: 

\getnextline{}

More text here, then second line: \getnextline{}

Third line: \getnextline{}

Read past end of file? \getnextline{}
\closein\file

\end{document}

produces

enter image description here

I hope this works for you in XeTeX with Arabic text.

Edit to answer the OP's further questions:

Since the example here works, the open and close statements are where they belong. Good programming practice (in any language) would call for checks that the file exists and is readable. I didn't do that since this answer is just a proof of concept. If you are in complete control of your environment and know the file will always be where it's expected, you need not bother.

I don't know what happens if you don't close an open file. I do know that TeX limits the number of files that you can have open at the same time. Some applications aren't good about releasing file handles by themselves, so it's a good habit to close them yourself.

I was curious about what would happen if I read past the end of the file. The example shows that reading a line that's not there just returns an empty string. You need to decide whether that's acceptable in your use case. If not, test for eof and act accordingly.

[Tex/LaTex] Read integers from external data file to document programming code in LaTeX

In my opinion, it would be simpler and more robust to just use linerange markers for listings in comments of your m-file. See the example below. I used the matlab-prettifier package instead of mcode, but the approach should work with either.

Basically, you define a prefix for those linerange markers with the rangeprefix key; here, the prefix I use is simply a percent character (%) followed by a space character:

rangeprefix=\%\ ,

Note that both of those characters must be escaped here (but not in your MATLAB code). Then, in your code, you use a pair of descriptive strings to mark the start and end of each of the ranges of interest. In the example below, I used param and endparam for the first range of interest, and param2 and endparam2 for the second range of interest. Make sure to use

includerangemarker=false

if you don't want the markers themselves to appear in the output.

You can find more details about those linerange markers in subsection 5.7 of the listings manual.