[Tex/LaTex] Import data from a spreadsheet into latex and create multiple pdf files for each row in LaTeX

database

I have a form which I assign to people to fill them out as a google form, and get the responses in a spreadsheet (xlsx or ods file formats). I need to file a report based on their answers, which means I have to generate a pdf file for each row of the spreadsheet. Here is an example. Imagine this is the first three rows of the soreadsheet:

Name Question1 Question2 Question3
Name1 Answer1.1 Answer2.1 Answer 3.1
Name2 Answer1.2 Answer2.2 Answer 3.2

What I need to get is two pdf files named Name1.pdf and Name2.pdf which first one contains something like this:

Name: Name1
Question1: Answer1.1
Question2: Answer2.1
Question3: Answer3.1

And so on. I was thinking that one might write a script in python and read each row of the file and give it as an input to pdflatex command on a template file, but I don't have much ideas on how to really start on this.

Best Answer

A little more detail...

If I start with a spreadsheet which looks like yours, I then save as, pick .csv and choose ; as field separator and nothing to surround text. (This is in calc but I assume other software offers similar functionality.)

This produces the following .csv file which I saved as question.csv:

Name;Question1; Question2; Question3
Name1;Answer1.1;Answer2.1;Answer 3.1
Name2;Answer1.2;Answer2.2;Answer 3.2

I then run

gawk 'BEGIN { RS=";"; ORS="\n" } { print }' question.csv > question.dat

which produces question.dat:

Name
Question1
 Question2
 Question3
Name1
Answer1.1
Answer2.1
Answer 3.1
Name2
Answer1.2
Answer2.2
Answer 3.2

We don't especially want a pdf with the headers in it but I think it can be useful to have a 'dummy' page just to make sure everything ends up in the right places. However, you can easily exclude this if you prefer. It would be good to tidy up the stray spaces at the start of some lines, though:

sed -i 's/^  *//' question.dat

gets me question.dat:

Name
Question1
Question2
Question3
Name1
Answer1.1
Answer2.1
Answer 3.1
Name2
Answer1.2
Answer2.2
Answer 3.2

You can now use the data in a template .tex file, formatting it as you wish. Just for example, I've used the description environment as I don't know how long the answers might be so tabular seemed potentially problematic:

\documentclass{article}
\usepackage{textmerg}

\begin{document}

\Fields{\subjectname\questionone\questiontwo\questionthree}

\Merge{question.dat}{%

\begin{description}
    \item[Name:] \subjectname
    \item[Question1:] \questionone
    \item[Question2:] \questiontwo
    \item[Question3:] \questionthree
\end{description}

\cleardoublepage
}

\end{document}

This produces a 3 page pdf file. To separate the pages into separate pdfs, I used pdftk as follows:

pdftk question.pdf burst

This gives me pg_0001.pdf, pg_0002.pdf and pg_0003.pdf. The remaining problem is therefore to rename them using the names from the original file. This might be problematic if you have names with accented characters etc. Assuming nothing deviates too far from what your system will accept:

ls pg_000* > pdf.list
sed 's/\;.*$//' question.csv > name.list

If you need to clean up the name list, do it now. For example, you might need to remove spaces:

sed -i 's/ //g' name.list

Then create a file of mv commands. I'm doing it this way because if you have a lot of data, storing all the names as arguments is likely to exceed the capacity of your shell. This way, each data entry gets its own command.

paste -d ' ' pdf.list name.list | sed -e 's/^/mv /' -e 's/$/.pdf/' > cmds.list

Now you can run the commands with e.g. sh cmds.list.

This gives me three pdfs named Name.pdf, Name1.pdf and Name2.pdf.

Name.pdf is the dummy run:

dummy run

Name1.pdf corresponds to the first data row:

first data row

and Name2.pdf to the second:

second data row

Obviously, this process can be tweaked in various ways and you can combine things in scripts etc. It can also be made more efficient, especially, I think, for the renaming. But the best way to do that probably depends on the details and hopefully this would give you a starting point if you end up using something like this workflow.

Related Question