Afaict there is nothing builtin. But as often you can get there by combining some of the existing functionality. All you need is a CSV parser and you can use buffers to do the rest. (I modified the interface a bit so you can simply \insert[Field Name]
instead of \insertFieldName
.) The usage is as follows:
Define a template. In the revised form, your example code would look like this:
\startcsvtemplate [tpl]
Dear \insert[Name],
You owe \insert[Amount]. Please send it before \insert[Date].
\par
\stopcsvtemplate
Trailing endlines are stripped, so you will have to request paragraphs explicitly.
Define an input buffer (optional): Input can be read from a file or from a buffer. In the latter case, the buffer needs to be defined, just like any other buffer:
\startbuffer[csdata]
Name,Amount,Date
"Mr. White","\letterdollar 300","Dec. 2, 1911"
"Mr. Brown","\letterdollar 300","Dec. 3, 1911"
"Ms. Premise","\letterdollar 42","Dec. 4, 1911"
"Ms. Conclusion","\letterdollar 23","Dec. 5, 1911"
\stopbuffer
Request the input to be parsed: Depending on whether you chose to read the data from a buffer or from a file, you will have to process it using the appropriate command:
\processcsvbuffer[one][csdata]
\processcsvfile[two][test.csv]
The first argument of either command is the id by which the dataset can be referenced later (similar to \useexternalfigure[a_cow][cow.pdf]
).
Now that dataset and template are in place, you can use them together in a job definition:
\definecsvjob [testing] [
data=two,
template=tpl,
]
This will generate a macro \testing
which you can use in your document to generate the output.
\starttext \testing \stoptext
NB: The answer below can (and probably should, if used frequently) be improved by defining some template language and moving the string processing to Lua entirely. As it is, the performance will be poor due to the repeated calls to Lua from TeX.
% macros=mkvi
\unprotect
\startluacode
local datasets = { }
local buffersraw = buffers.raw
local context = context
local ioloaddata = io.loaddata
local lpegmatch = lpeg.match
local stringformat = string.format
local stringmatch = string.match
local stringsub = string.sub
local tableconcat = table.concat
local tableswapped = table.swapped
local die = function (msg) print(msg or "ERROR") os.exit(1) end
local csv_parser
do
--- This is (more than) an RFC 4180 parser.
--- https://www.rfc-editor.org/rfc/rfc4180
local C, Cg, Cs, Ct, P, S, V
= lpeg.C, lpeg.Cg, lpeg.Cs, lpeg.Ct, lpeg.P, lpeg.S, lpeg.V
local backslash = P[[\letterbackslash]]
local comma = ","
local dquote = P[["]]
local eol = S"\n\r"^1
local noquote = 1 - dquote
local unescape = function (s) return stringsub(s, 2) end
csv_parser = P{
"file",
file = Ct((V"header" * eol)^-1 * V"records"),
header = Cg(Ct(V"name" * (comma * V"name")^0), "header"),
records = V"record" * (eol * V"record")^0 * eol^0,
record = Ct(V"field" * (comma * V"field")^0),
name = V"field",
field = V"escaped" + V"non_escaped",
--- Deviate from rfc: the “textdata” terminal was defined only
--- for 7bit ASCII. Also, any character may occur in a quoted
--- field as long as it is escaped with a backslash. (\TEX --- macros start with two backslashes.)
escaped = dquote
* Cs(((backslash * 1 / unescape) + noquote)^0)
* dquote
,
non_escaped = C((1 - dquote - eol - comma)^0),
}
end
local process = function (id, raw)
--- buffers may have trailing EOLs
raw = stringmatch(raw, "^[\n\r]*(.-)[\n\r]*$")
local data = lpegmatch(csv_parser, raw)
--- map column name -> column nr
data.header = tableswapped(data.header)
datasets[id] = data
end
--- escaping hell ahead, please ignore.
local s_item = [[
\bgroup
\string\def\string\insert{\string\getvalue{csv_insert_field}{%s}{%s}}%%
%s%% template
\egroup
]]
local typeset = function (id, template)
local data = datasets[id] or die("ERROR unknown dataset: " .. id)
template = stringmatch(buffersraw(template), "^[\n\r]*(.-)[\n\r]*$")
local result = { }
local last = \letterhash data
for i=1, last do
result[i] = stringformat(s_item, id, i, template)
end
context(tableconcat(result))
end
local insert = function (id, n, field)
local this = datasets[id]
context(this[n][this.header[field]])
end
commands.process_csv = process
commands.process_csv_file = function (id, fname)
process(id, ioloaddata(fname, true))
end
commands.typeset_csv_job = typeset
commands.insert_csv_field = insert
\stopluacode
\startinterface all
\setinterfaceconstant{template}{template}
\setinterfaceconstant {data}{data}
\stopinterface
\def\processcsvbuffer[#id][#buf]{%
\ctxcommand{process_csv([[#id]], buffers.raw(\!!bs#buf\!!es))}%
}
\def\processcsvfile[#id][#filename]{%
\ctxcommand{process_csv_file([[#id]], \!!bs\detokenize{#filename}\!!es)}%
}
%% modeled after \startbuffer
\setuvalue{\e!start csvtemplate}{%
\begingroup
\obeylines
\dosingleempty\csv_template_start%
}
\def\csv_template_start[#id]{%
\buff_start_indeed{}{#id}{\e!start csvtemplate}{\e!stop csvtemplate}%
}
\installnamespace {csvjob}
\installcommandhandler \????csvjob {csvjob} \????csvjob
\appendtoks
\setuevalue{\currentcsvjob}{\csv_job_direct[\currentcsvjob]}
\to \everydefinecsvjob
\unexpanded\def\csv_job_direct[#id]{%
\edef\currentcsvjob{#id}%
\dosingleempty\csv_job_indeed%
}
\def\csv_job_indeed[#setups]{%
\iffirstargument\setupcurrentcsvjob[#setups]\fi
\ctxcommand{typeset_csv_job(
[[\csvjobparameter\c!data]],
[[\csvjobparameter\c!template]])}%
}
\def\csv_insert_field#id#n[#field]{%
\ctxcommand{insert_csv_field([[#id]], #n, [[#field]])}%
}
\protect
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% demo
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Stepwise instructions.
%% step 1: Define template.
\startcsvtemplate [tpl]
Dear \insert[Name],
You owe \insert[Amount]. Please send it before \insert[Date].
\par
\stopcsvtemplate
%% step 2: Define an input (CSV).
\startbuffer[csdata]
Name,Amount,Date
"Mr. White","\\letterdollar 300","Dec. 2, 1911"
"Mr. Brown","\\letterdollar 300","Dec. 3, 1911"
"Ms. Premise","\\letterdollar 42","Dec. 4, 1911"
"Ms. Conclusion","\\letterdollar 23","Dec. 5, 1911"
\stopbuffer
%% step 3: Parse and store the input.
\processcsvbuffer[one][csdata]
%\processcsvfile[two][test.csv]
%% step 4: Declare a job, joining dataset and template.
\definecsvjob [testing] [
data=two,
template=tpl,
]
%% step 5: Enjoy!
\starttext
\testing
\stoptext
As the mightiest tool in my toolbox is python, I see python problems everywhere. (The "my only tool is a hammer, everything looks like a nail problem".)
The idea is the following:
- A data file, either excel, csv, mysql-database or whatever you like
- A LaTeX-template with placeholders, I used @@key, as two @ should never occure in a document, but you can think of your own.
- A python script that fills the placeholders for each row in the data
an calls LaTeX to produce the result.
Apart from the standard library you only need pandas
for the data munching
part. If you want to learn python in the next time and have a scientific background, which i assume if you are using mathematica, i recommend to install
python using the anaconda distribution.
It comes bundled with nearly all scientific modules und prebuilt dependencies:
http://continuum.io/downloads#py34
This is my data.csv
:
productID,firstname,lastname,date
1,Jules,Winnfield,2015-01-01
2,Vincent,Vega,2015-01-02
This is my really simplistic template.tex
:
\documentclass{scrartcl}
\usepackage{fontspec}
\begin{document}
\begin{itemize}
\item @@productID
\item @@firstname
\item @@lastname
\end{itemize}
\end{document}
And this is the python script, please ask if something is not
clear:
# codecs provides input/output with specific encoding
import codecs
# os includes operating system operations
import os
# this is needed to call latex from python
from subprocess import call
# pandas for data munching
import pandas
# create folders, no error if it already exists
os.makedirs('tmp', exist_ok=True)
os.makedirs('output', exist_ok=True)
# read in the template:
with codecs.open('template.tex', encoding='utf8') as f:
template = f.read()
data = pandas.read_csv('data.csv')
# show the first 5 rows in the data to have a quick look
print(data.head())
# these are the keys we want to replace with our data:
keys = [
'productID',
'firstname',
'lastname',
]
# no we loop over each row to create a pdf with the
# data
for index, row in data.iterrows():
filled = template
for key in keys:
# replace our placeholder with the actual data, cast to string first
filled = filled.replace('@@' + key, str(row[key]))
# create a hopefully unique filename
filename = 'filled_{}_{}_{}'.format(
row.lastname,
row.firstname,
row.date,
)
# now we write the filled template to the tmp folder
with codecs.open('tmp/' + filename + '.tex', 'w', encoding='utf8') as f:
f.write(filled)
# and call lualatex or any other latex compiler
# call takes a list of arguments
call(['lualatex',
'--interaction=batchmode',
'--output-directory=tmp',
'tmp/' + filename + '.tex',
])
# there is a missing newline at the end of the latex call
print('\n')
# now move the file to the output folder:
os.rename('tmp/' + filename + '.pdf', 'output/' + filename + '.pdf')
# now we delete the tmp folder
call(['rm', '-rf', 'tmp'])
pandas also provides read_excel
, read_sql_table
, read_sql_query
and many more: http://pandas.pydata.org/pandas-docs/stable/io.html
Best Answer
A little more detail...
If I start with a spreadsheet which looks like yours, I then
save as
, pick.csv
and choose;
as field separator and nothing to surround text. (This is incalc
but I assume other software offers similar functionality.)This produces the following
.csv
file which I saved asquestion.csv
:I then run
which produces
question.dat
:We don't especially want a pdf with the headers in it but I think it can be useful to have a 'dummy' page just to make sure everything ends up in the right places. However, you can easily exclude this if you prefer. It would be good to tidy up the stray spaces at the start of some lines, though:
gets me
question.dat
:You can now use the data in a template
.tex
file, formatting it as you wish. Just for example, I've used thedescription
environment as I don't know how long the answers might be sotabular
seemed potentially problematic:This produces a 3 page pdf file. To separate the pages into separate pdfs, I used
pdftk
as follows:This gives me
pg_0001.pdf
,pg_0002.pdf
andpg_0003.pdf
. The remaining problem is therefore to rename them using the names from the original file. This might be problematic if you have names with accented characters etc. Assuming nothing deviates too far from what your system will accept:If you need to clean up the name list, do it now. For example, you might need to remove spaces:
Then create a file of
mv
commands. I'm doing it this way because if you have a lot of data, storing all the names as arguments is likely to exceed the capacity of your shell. This way, each data entry gets its own command.Now you can run the commands with e.g.
sh cmds.list
.This gives me three pdfs named
Name.pdf
,Name1.pdf
andName2.pdf
.Name.pdf
is the dummy run:Name1.pdf
corresponds to the first data row:and
Name2.pdf
to the second:Obviously, this process can be tweaked in various ways and you can combine things in scripts etc. It can also be made more efficient, especially, I think, for the renaming. But the best way to do that probably depends on the details and hopefully this would give you a starting point if you end up using something like this workflow.