[Tex/LaTex] How to merge data and create multiple documents?, similar to personalized mailings

automationcsvdatabaseexcelpython

I would like to use LATEX combined with EXCEL data files (or CSV if it were neccessary) to create "proof of payment".

Supose a list of people

John McEnroe, NYC (USA), 110.23$, 10$,May……
Mary Rose, London, 567.89$, 50$,May………
…………………………………..

and what I would like to obtain is a paper similar to
enter image description here
where you can view the fields of each record of the database.

The question is that I can´t create this.

Could you recommend me any packages or tips to create multiple of this?

Thank you.

I know that it´s not a concret question. It´s a call of SOS asking for help.

Best Answer

As the mightiest tool in my toolbox is python, I see python problems everywhere. (The "my only tool is a hammer, everything looks like a nail problem".)

The idea is the following:

A data file, either excel, csv, mysql-database or whatever you like
A LaTeX-template with placeholders, I used @@key, as two @ should never occure in a document, but you can think of your own.
A python script that fills the placeholders for each row in the data an calls LaTeX to produce the result.

Apart from the standard library you only need pandas for the data munching part. If you want to learn python in the next time and have a scientific background, which i assume if you are using mathematica, i recommend to install python using the anaconda distribution. It comes bundled with nearly all scientific modules und prebuilt dependencies:

http://continuum.io/downloads#py34

This is my data.csv:

productID,firstname,lastname,date
1,Jules,Winnfield,2015-01-01
2,Vincent,Vega,2015-01-02

This is my really simplistic template.tex:

\documentclass{scrartcl}

\usepackage{fontspec}

\begin{document}

\begin{itemize}
  \item @@productID
  \item @@firstname
  \item @@lastname
\end{itemize}

\end{document}

And this is the python script, please ask if something is not clear:

# codecs provides input/output with specific encoding
import codecs
# os includes operating system operations
import os
# this is needed to call latex from python
from subprocess import call
# pandas for data munching
import pandas

# create folders, no error if it already exists
os.makedirs('tmp', exist_ok=True)
os.makedirs('output', exist_ok=True)

# read in the template:
with codecs.open('template.tex', encoding='utf8') as f:
    template = f.read()

data = pandas.read_csv('data.csv')
# show the first 5 rows in the data to have a quick look
print(data.head())

# these are the keys we want to replace with our data:
keys = [
    'productID',
    'firstname',
    'lastname',
]

# no we loop over each row to create a pdf with the
# data
for index, row in data.iterrows():
    filled = template
    for key in keys:
        # replace our placeholder with the actual data, cast to string first
        filled = filled.replace('@@' + key, str(row[key]))

    # create a hopefully unique filename
    filename = 'filled_{}_{}_{}'.format(
        row.lastname,
        row.firstname,
        row.date,
    )
    # now we write the filled template to the tmp folder
    with codecs.open('tmp/' + filename + '.tex', 'w', encoding='utf8') as f:
        f.write(filled)

    # and call lualatex or any other latex compiler
    # call takes a list of arguments
    call(['lualatex',
          '--interaction=batchmode',
          '--output-directory=tmp',
          'tmp/' + filename + '.tex',
          ])

    # there is a missing newline at the end of the latex call
    print('\n')

    # now move the file to the output folder:
    os.rename('tmp/' + filename + '.pdf', 'output/' + filename + '.pdf')

# now we delete the tmp folder
call(['rm', '-rf', 'tmp'])

pandas also provides read_excel, read_sql_table, read_sql_query and many more: http://pandas.pydata.org/pandas-docs/stable/io.html

Related Solutions

[Tex/LaTex] Is it possible to connect a database to LaTeX to produce data-driven documents

You may try latexdb. It connects LaTeX to a MySQL database.

[Tex/LaTex] How to create form letters from spreadsheet data in ConTeXt

Afaict there is nothing builtin. But as often you can get there by combining some of the existing functionality. All you need is a CSV parser and you can use buffers to do the rest. (I modified the interface a bit so you can simply \insert[Field Name] instead of \insertFieldName.) The usage is as follows:

Define a template. In the revised form, your example code would look like this:

\startcsvtemplate [tpl] Dear \insert[Name],

You owe \insert[Amount]. Please send it before \insert[Date]. \par \stopcsvtemplate

Trailing endlines are stripped, so you will have to request paragraphs explicitly.
Define an input buffer (optional): Input can be read from a file or from a buffer. In the latter case, the buffer needs to be defined, just like any other buffer:

\startbuffer[csdata] Name,Amount,Date "Mr. White","\letterdollar 300","Dec. 2, 1911" "Mr. Brown","\letterdollar 300","Dec. 3, 1911" "Ms. Premise","\letterdollar 42","Dec. 4, 1911" "Ms. Conclusion","\letterdollar 23","Dec. 5, 1911" \stopbuffer
Request the input to be parsed: Depending on whether you chose to read the data from a buffer or from a file, you will have to process it using the appropriate command:

\processcsvbuffer[one][csdata] \processcsvfile[two][test.csv]

The first argument of either command is the id by which the dataset can be referenced later (similar to \useexternalfigure[a_cow][cow.pdf]).
Now that dataset and template are in place, you can use them together in a job definition:

\definecsvjob [testing] [ data=two, template=tpl, ]

This will generate a macro \testing which you can use in your document to generate the output.

\starttext \testing \stoptext

NB: The answer below can (and probably should, if used frequently) be improved by defining some template language and moving the string processing to Lua entirely. As it is, the performance will be poor due to the repeated calls to Lua from TeX.

Example output.

% macros=mkvi

\unprotect
\startluacode
  local datasets = { }

  local buffersraw   = buffers.raw
  local context      = context
  local ioloaddata   = io.loaddata
  local lpegmatch    = lpeg.match
  local stringformat = string.format
  local stringmatch  = string.match
  local stringsub    = string.sub
  local tableconcat  = table.concat
  local tableswapped = table.swapped

  local die = function (msg) print(msg or "ERROR") os.exit(1) end

  local csv_parser
  do
    --- This is (more than) an RFC 4180 parser.
    --- https://www.rfc-editor.org/rfc/rfc4180
    local C, Cg, Cs, Ct, P, S, V
        = lpeg.C, lpeg.Cg, lpeg.Cs, lpeg.Ct, lpeg.P, lpeg.S, lpeg.V

    local backslash = P[[\letterbackslash]]
    local comma     = ","
    local dquote    = P[["]]
    local eol       = S"\n\r"^1
    local noquote   = 1 - dquote
    local unescape  = function (s) return stringsub(s, 2) end
    csv_parser = P{
      "file",
      file    = Ct((V"header" * eol)^-1 * V"records"),
      header  = Cg(Ct(V"name" * (comma * V"name")^0), "header"),
      records = V"record" * (eol * V"record")^0 * eol^0,
      record  = Ct(V"field" * (comma * V"field")^0),
      name    = V"field",
      field   = V"escaped" + V"non_escaped",
      --- Deviate from rfc: the “textdata” terminal was defined only
      --- for 7bit ASCII. Also, any character may occur in a quoted
      --- field as long as it is escaped with a backslash. (\TEX          --- macros start with two backslashes.)
      escaped     = dquote
                  * Cs(((backslash * 1 / unescape) + noquote)^0)
                  * dquote
                  ,
      non_escaped = C((1 - dquote - eol - comma)^0),
    }
  end

  local process = function (id, raw)
    --- buffers may have trailing EOLs
    raw = stringmatch(raw, "^[\n\r]*(.-)[\n\r]*$")
    local data = lpegmatch(csv_parser, raw)
    --- map column name -> column nr
    data.header = tableswapped(data.header)
    datasets[id] = data
  end

  --- escaping hell ahead, please ignore.
  local s_item = [[
  \bgroup
    \string\def\string\insert{\string\getvalue{csv_insert_field}{%s}{%s}}%%
%s%% template
  \egroup
]]

  local typeset = function (id, template)
    local data   = datasets[id] or die("ERROR unknown dataset: " .. id)
    template     = stringmatch(buffersraw(template), "^[\n\r]*(.-)[\n\r]*$")
    local result = { }
    local last = \letterhash data
    for i=1, last do
      result[i] = stringformat(s_item, id, i, template)
    end
    context(tableconcat(result))
  end

  local insert = function (id, n, field)
    local this = datasets[id]
    context(this[n][this.header[field]])
  end

  commands.process_csv      = process
  commands.process_csv_file = function (id, fname)
    process(id, ioloaddata(fname, true))
  end
  commands.typeset_csv_job  = typeset
  commands.insert_csv_field = insert

\stopluacode

\startinterface all
  \setinterfaceconstant{template}{template}
  \setinterfaceconstant    {data}{data}
\stopinterface

\def\processcsvbuffer[#id][#buf]{%
  \ctxcommand{process_csv([[#id]], buffers.raw(\!!bs#buf\!!es))}%
}

\def\processcsvfile[#id][#filename]{%
  \ctxcommand{process_csv_file([[#id]], \!!bs\detokenize{#filename}\!!es)}%
}

%% modeled after \startbuffer
\setuvalue{\e!start csvtemplate}{%
  \begingroup
  \obeylines
  \dosingleempty\csv_template_start%
}

\def\csv_template_start[#id]{%
  \buff_start_indeed{}{#id}{\e!start csvtemplate}{\e!stop csvtemplate}%
}

\installnamespace                  {csvjob}
\installcommandhandler \????csvjob {csvjob} \????csvjob

\appendtoks
  \setuevalue{\currentcsvjob}{\csv_job_direct[\currentcsvjob]}
\to \everydefinecsvjob

\unexpanded\def\csv_job_direct[#id]{%
  \edef\currentcsvjob{#id}%
  \dosingleempty\csv_job_indeed%
}

\def\csv_job_indeed[#setups]{%
  \iffirstargument\setupcurrentcsvjob[#setups]\fi
  \ctxcommand{typeset_csv_job(
                [[\csvjobparameter\c!data]],
                [[\csvjobparameter\c!template]])}%
}

\def\csv_insert_field#id#n[#field]{%
  \ctxcommand{insert_csv_field([[#id]], #n, [[#field]])}%
}

\protect

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%                               demo
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%% Stepwise instructions.
%% step 1: Define template.
\startcsvtemplate [tpl]
Dear \insert[Name],

You owe \insert[Amount]. Please send it before \insert[Date].
\par
\stopcsvtemplate

%% step 2: Define an input (CSV).
\startbuffer[csdata]
Name,Amount,Date
"Mr. White","\\letterdollar 300","Dec. 2, 1911"
"Mr. Brown","\\letterdollar 300","Dec. 3, 1911"
"Ms. Premise","\\letterdollar 42","Dec. 4, 1911"
"Ms. Conclusion","\\letterdollar 23","Dec. 5, 1911"
\stopbuffer

%% step 3: Parse and store the input.
\processcsvbuffer[one][csdata]
%\processcsvfile[two][test.csv]

%% step 4: Declare a job, joining dataset and template.
\definecsvjob [testing] [
  data=two,
  template=tpl,
]

%% step 5: Enjoy!
\starttext 
  \testing
\stoptext

Best Answer

Related Solutions

[Tex/LaTex] Is it possible to connect a database to LaTeX to produce data-driven documents

[Tex/LaTex] How to create form letters from spreadsheet data in ConTeXt

Related Question