[Tex/LaTex] LaTeX for automatic report generation

automationprogrammingscripts

I accept that this question might be a bit outside the scope of this page, but please bear with me!

At work I'm going to set up a framework for generating some pretty basic reports, which could be on a monthly or quarterly basis. The data source is an online reporting system which generates .csv output.

Of course, this could be done quite manually by making a simple Excel file with input for numbers and output for results, and then copy-paste the results into a Word document template. Quite frankly, it feels like this procedure could and should be completely automated. I would therefore like to use this opportunity to learn how to build a more thorough tool, which should of course generate a latex'ed PDF. However, I find myself in the lack of understanding how to attack this problem. There are several questions about automatically generate LaTeX documents on this site, but I don't find that they answer my questions regarding a fully automated procedure.

My needs are as follows:

  • A simple interface to set monthly or quarterly basis for the data manipulations
  • including the latex distribution "in the program" or call an online generator (I'm not guaranteed that future users of the program will even know about LaTeX)
  • Do some simple calculations with the data source numbers
  • Insert these numbers in a .tex file, compile and make the PDF available to the user.

Example of workflow:
The user will, in a simple GUI, chose to generate report for january2016. Press "generate report" which make the program import jan16 numbers from a .csv file, do some very basic calculations, and push the results into a premade LaTeX template, compile and output the PDF.

I have some very concrete questions with respect to this:

  1. Is there a programming language/tool I could use as a basis? For instance a templating language such as Cheeta, or do I need something completely different, like Excel?
  2. Is it possible to include a basic latex engine with a few, carefully selected packages included, in such a program, call an online compiler, or similar? It is important that the tex distribution doesn't need to be installed on the computer where the program is run from. I assume that the program will be located in a folder on a server.

Best Answer

Holene,

I have done something similar in the past. For example, I once organised a scientific meeting. There was a website, where the attendees could register themselves by filling a given HTML form. The inserted data was stored in a database (on the first try, a simple text file, later on a SQL-Database).

The HTML page with the registration form did some checks, to ensure, the given data was plausible. Afterwards, the data was stored in the database and a registration letter was typeset by LaTeX, using the inserted data from the HTML form, and the final PostScript-file was sent automatically to an printer nearby.

Later on, after the registration was complete, I used the contents of the database to create a list of attendees (sorted either alphabetically or by ZIP-Code). To do so, I created some LaTeX-macros, which helped me, to get everything in a neat, representive layout. Biggest advantage is, that the script, you use to extract the data from your database, does not need to know anything about the final layout ...

I also used the same database entries, to create name badges for the attendees. There was another LaTeX-macro, which was able to detect, if it was a normal attendee, a speaker or a local organisation member (or even all three of them) :-).

All those scripts were coded in PHP; they did all the work to display the HTML page, as well as to communicate with the SQL-database as well as to generate the temporary LaTeX-documents and to call LaTeX to produce the PS files.

Today, I am using the export of our students learning platform (if you are curious: stud.IP), to generate attendees lists for my courses. This list is exported by the system as a CSV list. I wrote an bash script, which reads the csv list, utilizes SED and AWK, to delete the first 3 lines, filter the interesting informations for me and to output the result as a complete LaTeX file, ready to be compiled by pdfLaTeX. This time, I even though about writing a class of my own. In the end, I ended up in writing again a style file, which takes three package options. By means of this options, I am able, to manipulate the layout of one identical LaTeX command, to generate three completely different outputs. An attendee list, a big name plate to be placed in front of the student and last but not least, a certificate to be handed out after having successful finished the course.

What I wanted to make clear is, that you are absolute correct, to try to use LaTeX in order to produce automatically generated PDF files. To me, that makes absolute sense.

It only depends on your working environment, the tools you have at hand and the tools you can master. Unfortunately, you didn't tell any specific hints about those three things. Therefore my answer is

  1. Yes there is. Indeed, there is quite sure more than one language, you can use. It only depends on: which platform do you use, which language is available (bash, SED, AWK, Perl, PHP, Lisp, C, C++, C#, ...) and which language are you skilled and trained in.
  2. I wouldn't incorporate LaTeX into your language. Instead, I would argue, to ask your script, program, programing language to place a system call to do the dirty LaTeX job. For example: if you are programing in an ordinary shell (e.g. Bash, sh, csh, ... on Linux and MacOS computers, maybe Powershell or even Bash on newer Windows systems) you just can can say pdflatex jan2016-report.texand it will produce the corresponding PDF output for you.

What you should do: you should examine carefully, what kind of data you can extract from your database. Try to build the extraction routine in that way, that all the output from the database (the CSV file you mentioned) is formatted exactly the same way, regardless if you are building a monthly or a quarterly report. Than you should try to split the relevant data into minor portions, that can be easily handled by LaTeX-Macros (which can hold up to nine arguments). Try to code some macros of your own, presenting some helpful LaTeX-commands and environments. The only thing you still have to do, is to code your reporting script, that it adds those LaTeX macros and environments into the result, the database query gave you.

Have fun

Jan