[Tex/LaTex] Import data from a spreadsheet into latex and create multiple pdf files for each row in LaTeX

database

I have a form which I assign to people to fill them out as a google form, and get the responses in a spreadsheet (xlsx or ods file formats). I need to file a report based on their answers, which means I have to generate a pdf file for each row of the spreadsheet. Here is an example. Imagine this is the first three rows of the soreadsheet:

Name Question1 Question2 Question3
Name1 Answer1.1 Answer2.1 Answer 3.1
Name2 Answer1.2 Answer2.2 Answer 3.2

What I need to get is two pdf files named Name1.pdf and Name2.pdf which first one contains something like this:

Name: Name1
Question1: Answer1.1
Question2: Answer2.1
Question3: Answer3.1

And so on. I was thinking that one might write a script in python and read each row of the file and give it as an input to pdflatex command on a template file, but I don't have much ideas on how to really start on this.

Best Answer

A little more detail...

If I start with a spreadsheet which looks like yours, I then save as, pick .csv and choose ; as field separator and nothing to surround text. (This is in calc but I assume other software offers similar functionality.)

This produces the following .csv file which I saved as question.csv:

Name;Question1; Question2; Question3
Name1;Answer1.1;Answer2.1;Answer 3.1
Name2;Answer1.2;Answer2.2;Answer 3.2

I then run

gawk 'BEGIN { RS=";"; ORS="\n" } { print }' question.csv > question.dat

which produces question.dat:

Name
Question1
 Question2
 Question3
Name1
Answer1.1
Answer2.1
Answer 3.1
Name2
Answer1.2
Answer2.2
Answer 3.2

We don't especially want a pdf with the headers in it but I think it can be useful to have a 'dummy' page just to make sure everything ends up in the right places. However, you can easily exclude this if you prefer. It would be good to tidy up the stray spaces at the start of some lines, though:

sed -i 's/^  *//' question.dat

gets me question.dat:

Name
Question1
Question2
Question3
Name1
Answer1.1
Answer2.1
Answer 3.1
Name2
Answer1.2
Answer2.2
Answer 3.2

You can now use the data in a template .tex file, formatting it as you wish. Just for example, I've used the description environment as I don't know how long the answers might be so tabular seemed potentially problematic:

\documentclass{article}
\usepackage{textmerg}

\begin{document}

\Fields{\subjectname\questionone\questiontwo\questionthree}

\Merge{question.dat}{%

\begin{description}
    \item[Name:] \subjectname
    \item[Question1:] \questionone
    \item[Question2:] \questiontwo
    \item[Question3:] \questionthree
\end{description}

\cleardoublepage
}

\end{document}

This produces a 3 page pdf file. To separate the pages into separate pdfs, I used pdftk as follows:

pdftk question.pdf burst

This gives me pg_0001.pdf, pg_0002.pdf and pg_0003.pdf. The remaining problem is therefore to rename them using the names from the original file. This might be problematic if you have names with accented characters etc. Assuming nothing deviates too far from what your system will accept:

ls pg_000* > pdf.list
sed 's/\;.*$//' question.csv > name.list

If you need to clean up the name list, do it now. For example, you might need to remove spaces:

sed -i 's/ //g' name.list

Then create a file of mv commands. I'm doing it this way because if you have a lot of data, storing all the names as arguments is likely to exceed the capacity of your shell. This way, each data entry gets its own command.

paste -d ' ' pdf.list name.list | sed -e 's/^/mv /' -e 's/$/.pdf/' > cmds.list

Now you can run the commands with e.g. sh cmds.list.

This gives me three pdfs named Name.pdf, Name1.pdf and Name2.pdf.

Name.pdf is the dummy run:

dummy run

Name1.pdf corresponds to the first data row:

first data row

and Name2.pdf to the second:

second data row

Obviously, this process can be tweaked in various ways and you can combine things in scripts etc. It can also be made more efficient, especially, I think, for the renaming. But the best way to do that probably depends on the details and hopefully this would give you a starting point if you end up using something like this workflow.

Related Solutions

[Tex/LaTex] How to create form letters from spreadsheet data in ConTeXt

Afaict there is nothing builtin. But as often you can get there by combining some of the existing functionality. All you need is a CSV parser and you can use buffers to do the rest. (I modified the interface a bit so you can simply \insert[Field Name] instead of \insertFieldName.) The usage is as follows:

Define a template. In the revised form, your example code would look like this:

\startcsvtemplate [tpl] Dear \insert[Name],

You owe \insert[Amount]. Please send it before \insert[Date]. \par \stopcsvtemplate

Trailing endlines are stripped, so you will have to request paragraphs explicitly.
Define an input buffer (optional): Input can be read from a file or from a buffer. In the latter case, the buffer needs to be defined, just like any other buffer:

\startbuffer[csdata] Name,Amount,Date "Mr. White","\letterdollar 300","Dec. 2, 1911" "Mr. Brown","\letterdollar 300","Dec. 3, 1911" "Ms. Premise","\letterdollar 42","Dec. 4, 1911" "Ms. Conclusion","\letterdollar 23","Dec. 5, 1911" \stopbuffer
Request the input to be parsed: Depending on whether you chose to read the data from a buffer or from a file, you will have to process it using the appropriate command:

\processcsvbuffer[one][csdata] \processcsvfile[two][test.csv]

The first argument of either command is the id by which the dataset can be referenced later (similar to \useexternalfigure[a_cow][cow.pdf]).
Now that dataset and template are in place, you can use them together in a job definition:

\definecsvjob [testing] [ data=two, template=tpl, ]

This will generate a macro \testing which you can use in your document to generate the output.

\starttext \testing \stoptext

NB: The answer below can (and probably should, if used frequently) be improved by defining some template language and moving the string processing to Lua entirely. As it is, the performance will be poor due to the repeated calls to Lua from TeX.

Example output.

% macros=mkvi

\unprotect
\startluacode
  local datasets = { }

  local buffersraw   = buffers.raw
  local context      = context
  local ioloaddata   = io.loaddata
  local lpegmatch    = lpeg.match
  local stringformat = string.format
  local stringmatch  = string.match
  local stringsub    = string.sub
  local tableconcat  = table.concat
  local tableswapped = table.swapped

  local die = function (msg) print(msg or "ERROR") os.exit(1) end

  local csv_parser
  do
    --- This is (more than) an RFC 4180 parser.
    --- https://www.rfc-editor.org/rfc/rfc4180
    local C, Cg, Cs, Ct, P, S, V
        = lpeg.C, lpeg.Cg, lpeg.Cs, lpeg.Ct, lpeg.P, lpeg.S, lpeg.V

    local backslash = P[[\letterbackslash]]
    local comma     = ","
    local dquote    = P[["]]
    local eol       = S"\n\r"^1
    local noquote   = 1 - dquote
    local unescape  = function (s) return stringsub(s, 2) end
    csv_parser = P{
      "file",
      file    = Ct((V"header" * eol)^-1 * V"records"),
      header  = Cg(Ct(V"name" * (comma * V"name")^0), "header"),
      records = V"record" * (eol * V"record")^0 * eol^0,
      record  = Ct(V"field" * (comma * V"field")^0),
      name    = V"field",
      field   = V"escaped" + V"non_escaped",
      --- Deviate from rfc: the “textdata” terminal was defined only
      --- for 7bit ASCII. Also, any character may occur in a quoted
      --- field as long as it is escaped with a backslash. (\TEX          --- macros start with two backslashes.)
      escaped     = dquote
                  * Cs(((backslash * 1 / unescape) + noquote)^0)
                  * dquote
                  ,
      non_escaped = C((1 - dquote - eol - comma)^0),
    }
  end

  local process = function (id, raw)
    --- buffers may have trailing EOLs
    raw = stringmatch(raw, "^[\n\r]*(.-)[\n\r]*$")
    local data = lpegmatch(csv_parser, raw)
    --- map column name -> column nr
    data.header = tableswapped(data.header)
    datasets[id] = data
  end

  --- escaping hell ahead, please ignore.
  local s_item = [[
  \bgroup
    \string\def\string\insert{\string\getvalue{csv_insert_field}{%s}{%s}}%%
%s%% template
  \egroup
]]

  local typeset = function (id, template)
    local data   = datasets[id] or die("ERROR unknown dataset: " .. id)
    template     = stringmatch(buffersraw(template), "^[\n\r]*(.-)[\n\r]*$")
    local result = { }
    local last = \letterhash data
    for i=1, last do
      result[i] = stringformat(s_item, id, i, template)
    end
    context(tableconcat(result))
  end

  local insert = function (id, n, field)
    local this = datasets[id]
    context(this[n][this.header[field]])
  end

  commands.process_csv      = process
  commands.process_csv_file = function (id, fname)
    process(id, ioloaddata(fname, true))
  end
  commands.typeset_csv_job  = typeset
  commands.insert_csv_field = insert

\stopluacode

\startinterface all
  \setinterfaceconstant{template}{template}
  \setinterfaceconstant    {data}{data}
\stopinterface

\def\processcsvbuffer[#id][#buf]{%
  \ctxcommand{process_csv([[#id]], buffers.raw(\!!bs#buf\!!es))}%
}

\def\processcsvfile[#id][#filename]{%
  \ctxcommand{process_csv_file([[#id]], \!!bs\detokenize{#filename}\!!es)}%
}

%% modeled after \startbuffer
\setuvalue{\e!start csvtemplate}{%
  \begingroup
  \obeylines
  \dosingleempty\csv_template_start%
}

\def\csv_template_start[#id]{%
  \buff_start_indeed{}{#id}{\e!start csvtemplate}{\e!stop csvtemplate}%
}

\installnamespace                  {csvjob}
\installcommandhandler \????csvjob {csvjob} \????csvjob

\appendtoks
  \setuevalue{\currentcsvjob}{\csv_job_direct[\currentcsvjob]}
\to \everydefinecsvjob

\unexpanded\def\csv_job_direct[#id]{%
  \edef\currentcsvjob{#id}%
  \dosingleempty\csv_job_indeed%
}

\def\csv_job_indeed[#setups]{%
  \iffirstargument\setupcurrentcsvjob[#setups]\fi
  \ctxcommand{typeset_csv_job(
                [[\csvjobparameter\c!data]],
                [[\csvjobparameter\c!template]])}%
}

\def\csv_insert_field#id#n[#field]{%
  \ctxcommand{insert_csv_field([[#id]], #n, [[#field]])}%
}

\protect

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%                               demo
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%% Stepwise instructions.
%% step 1: Define template.
\startcsvtemplate [tpl]
Dear \insert[Name],

You owe \insert[Amount]. Please send it before \insert[Date].
\par
\stopcsvtemplate

%% step 2: Define an input (CSV).
\startbuffer[csdata]
Name,Amount,Date
"Mr. White","\\letterdollar 300","Dec. 2, 1911"
"Mr. Brown","\\letterdollar 300","Dec. 3, 1911"
"Ms. Premise","\\letterdollar 42","Dec. 4, 1911"
"Ms. Conclusion","\\letterdollar 23","Dec. 5, 1911"
\stopbuffer

%% step 3: Parse and store the input.
\processcsvbuffer[one][csdata]
%\processcsvfile[two][test.csv]

%% step 4: Declare a job, joining dataset and template.
\definecsvjob [testing] [
  data=two,
  template=tpl,
]

%% step 5: Enjoy!
\starttext 
  \testing
\stoptext

[Tex/LaTex] How to merge data and create multiple documents?, similar to personalized mailings

As the mightiest tool in my toolbox is python, I see python problems everywhere. (The "my only tool is a hammer, everything looks like a nail problem".)

The idea is the following:

A data file, either excel, csv, mysql-database or whatever you like
A LaTeX-template with placeholders, I used @@key, as two @ should never occure in a document, but you can think of your own.
A python script that fills the placeholders for each row in the data an calls LaTeX to produce the result.

Apart from the standard library you only need pandas for the data munching part. If you want to learn python in the next time and have a scientific background, which i assume if you are using mathematica, i recommend to install python using the anaconda distribution. It comes bundled with nearly all scientific modules und prebuilt dependencies:

http://continuum.io/downloads#py34

This is my data.csv:

productID,firstname,lastname,date
1,Jules,Winnfield,2015-01-01
2,Vincent,Vega,2015-01-02

This is my really simplistic template.tex:

\documentclass{scrartcl}

\usepackage{fontspec}

\begin{document}

\begin{itemize}
  \item @@productID
  \item @@firstname
  \item @@lastname
\end{itemize}

\end{document}

And this is the python script, please ask if something is not clear:

# codecs provides input/output with specific encoding
import codecs
# os includes operating system operations
import os
# this is needed to call latex from python
from subprocess import call
# pandas for data munching
import pandas

# create folders, no error if it already exists
os.makedirs('tmp', exist_ok=True)
os.makedirs('output', exist_ok=True)

# read in the template:
with codecs.open('template.tex', encoding='utf8') as f:
    template = f.read()

data = pandas.read_csv('data.csv')
# show the first 5 rows in the data to have a quick look
print(data.head())

# these are the keys we want to replace with our data:
keys = [
    'productID',
    'firstname',
    'lastname',
]

# no we loop over each row to create a pdf with the
# data
for index, row in data.iterrows():
    filled = template
    for key in keys:
        # replace our placeholder with the actual data, cast to string first
        filled = filled.replace('@@' + key, str(row[key]))

    # create a hopefully unique filename
    filename = 'filled_{}_{}_{}'.format(
        row.lastname,
        row.firstname,
        row.date,
    )
    # now we write the filled template to the tmp folder
    with codecs.open('tmp/' + filename + '.tex', 'w', encoding='utf8') as f:
        f.write(filled)

    # and call lualatex or any other latex compiler
    # call takes a list of arguments
    call(['lualatex',
          '--interaction=batchmode',
          '--output-directory=tmp',
          'tmp/' + filename + '.tex',
          ])

    # there is a missing newline at the end of the latex call
    print('\n')

    # now move the file to the output folder:
    os.rename('tmp/' + filename + '.pdf', 'output/' + filename + '.pdf')

# now we delete the tmp folder
call(['rm', '-rf', 'tmp'])

pandas also provides read_excel, read_sql_table, read_sql_query and many more: http://pandas.pydata.org/pandas-docs/stable/io.html

Best Answer

Related Solutions

[Tex/LaTex] How to create form letters from spreadsheet data in ConTeXt

[Tex/LaTex] How to merge data and create multiple documents?, similar to personalized mailings

Related Question