[Tex/LaTex] take a CSV file, process the data, and generate a report file, where can I go to learn how to do this

datatoolMATLABreporttexstudio

I have access to MATLAB for data processing, and TeXstudio for the report. I am familiar with MATLAB about 100x more than LaTeX. The data flow I am trying to create:

CSV file is generated with 50 columns of data, 1 timestamp column
for everything, 1 column where timestamps for events are noted by
the user since data is collected continuously (they push a button
while data is being collected to create the event timestamp)
MATLAB looks in the event timestamps column and finds the data at those times by comparing it against the timestamp column to locate the segments of data for processing.
MATLAB finds the parameters of the data segments that the user requests (min, max, median, rise times, fall times, etc)
MATLAB stores all of this information in a file somewhere or builds a large table with the information. There may also be graphs created from those data segments in (2)

Here's where LaTeX comes in:

LaTeX will create the report for this data. With pre-entered statements like during Task X the maximum value reached was Y, where Y is taken from the MATLAB table that was created for that task in (3).
This is repeated for each task. Some data columns may be grouped, and there may be some plots, table and figures that would also need to be dispersed throughout the final report.

I can figure out most of the MATLAB portions myself, it is the combining of the two together and the LaTeX portion i want to learn. Where are the best references or examples to do what I am proposing?

The data should look something like this (sorry I cant seem to figure out how to format a table on this thing but with a zillion more rows and data lines:

Time        Data Line 1  Data Line 2  Data Line 3  Data Line 4  Data Line 5 

7:49:23 AM  0.43493256   851.4415556  0.032704144  -1.24928     -0.016921   

7:49:24 AM  0.52979029   851.4415556  0.032704144  -1.24928     -0.016921

Best Answer

Here's an example document using Python, with both code and output.

To use this example, you will need Python. Anaconda is a good option for this sort of thing. You also need the pythontex package. It's in TeX Live. It can also be installed manually; download the latest version from GitHub, extract, then run the Python installer in the pythontex directory.

To compile the document, you will need to use a three-step compile when Python code needs to be executed. Run LaTeX, then pythontex, then LaTeX again. For example, for a file analysis.tex, you might use

pdflatex -interaction=nonstopmode analysis.tex
pythontex analysis.tex
pdflatex -interaction=nonstopmode analysis.tex

When pythontex is properly installed, you should have a wrapper/symlink that allows it to be run from the command line.

Data file data.csv:

Time,A,B,C,D,E
7:49:23 AM,0.43493256,851.4415556,0.032704144,-1.24928,-0.016921
7:49:24 AM,0.52979029,851.4415556,0.132704144,-3.07928,-0.016921
7:49:25 AM,0.34579029,851.4415556,0.173704144,-2.24258,-0.012351

TeX source:

\documentclass{article}

\usepackage{graphicx}
\usepackage{pythontex}

\begin{pycode}
# Import functions etc. that may be needed
from numpy import median, average, mean, std
import collections
from matplotlib import pyplot as plt

# Read in data and parse each column into a list of values
# Put the data in a dictionary, with keys corresponding to column labels
# This assumes simple, well-behaved CSV, with no quoting
with open('data.csv') as f:
    raw_data = f.readlines()
# Add the data file as a dependency to be tracked
# This causes Python code to be re-executed when changes are detected
# Optional, but can be useful
pytex.add_dependencies('data.csv')

# Need a way to convert times into a format that allows easy comparison
# Could also find a library to do this
def time_to_ISO8601_int(t):
    t = t.strip()
    hms, meridian = t.split(' ', 1)
    h, m, s = map(int, hms.split(':'))
    if 'AM' in meridian and h == 12:
        h = 0
    elif 'PM' in meridian and h != 12:
        h += 12
    return h*10000 + m*100 + s

# Store processed data in an ordered dictionary
# Ordered dictionaries store data in order of insertion
# So columns maintain their ordering
data = collections.OrderedDict()
# Create an entry in the data dictionary for each column
# Create an empty list for each column, which will be filled in later
# Python is zero-indexed
header_row = raw_data[0]
for item in header_row.split(','):
    data[item.strip()] = []
# Process data into dictionary
# For very large data sets, a more efficient approach might be beneficial
for line in raw_data[1:]:
    if line:
        vals = list(line.split(','))
        vals[0] = time_to_ISO8601_int(vals[0])
        vals[1:] = map(float, vals[1:])
        for n, data_list in enumerate(data.values()):
            data_list.append(vals[n])
\end{pycode}


\begin{document}

Make a table of max values.

\begin{table}[h!]
\begin{center}
\begin{pycode}
print('\\begin{tabular}{|c|c|}')
print('\\hline')
print('Field & Max \\\\')
print('\\hline')
for key in data:
    if key != 'Time':
        print(key + '&' + str(max(data[key])) + '\\\\')
        print('\\hline')
print('\\end{tabular}')
\end{pycode}
\end{center}
\caption{Maximum values}
\end{table}


Plot some data.

\begin{figure}[h!]
\begin{pycode}
plt.figure(figsize=(4,3))
plt.plot(data['Time'], data['A'], label='A')
plt.plot(data['Time'], data['C'], label='C')
plt.plot(data['Time'], data['E'], label='E')
# Prevent plotting from reformatting tick labels
plt.ticklabel_format(useOffset=False, axis='x')
plt.xticks(data['Time'])
plt.xlabel('Time (ISO 8601)')
plt.legend()
plt.savefig('fig.pdf', bbox_inches='tight')
\end{pycode}
\begin{center}
\includegraphics{fig}
\end{center}
\caption{Figure caption.}
\end{figure}

\end{document}

Output:

enter image description here

Related Solutions

[Tex/LaTex] How to export matlab figures to (pdf) LaTeX

Graphics with lots of points are always a challenge for TeX-based processors.

However, I am convinced that both memory and time limitations can be tackled to "reasonable" degree (i.e. to reduced pain).

There are two solutions which should both be considered:

to increase TeX's memory (or to circumvent the limitations of pdflatex).
to reduce the number of times the graphic is being processed by TeX (compile once, use often).

While the comments to your question already indicate some solutions concerning (2.), you may need more input for (1.). In fact, I believe that (1.) is the more pressing issue which cannot easily be solved by (2.).

Concerning (1.), I know that one solution works pretty well: to increase the limits. The pgfplots manual contains details instructions for both windows and linux how to enlarge the memory limits. I consider that to be a mandatory step for you - and invite you to follow the link above and read chapter "6 Memory and Speed Considerations" in the pgfplots manual. The chapter contains readily deployable configuration examples. It might be that switching to lualatex instead of the conventional tools (pdflatex or latex/dvips) might also solve the memory problem (I do not know).

Concerning (2.), you can use the standalone package (this site contains a lot of examples). This should work with any of your packages. However, if you use matlab2tikz, I find the TikZ library external very useful here - I tailored it to convert each figure to a separate pdf without changing the original document. Note that matlab2tikz uses pgfplots, so the link mentioned above might be very useful (it also contains a brief description of this automatic image externalization).

I believe that the steps above should help.

But there are always cases where one might also want to know about alternatives.

Here are some of them. I did not post them directly because I have the impression that you may already have an existing workflow and they may not fit - but perhaps you are interested in my experiences anyway:

a) you could try to implement (selected) figures directly in TeX. I did so by means of pgfplots which is quite powerful. I like the fact that I could define document-wide consistent styles and that the single documents are, well, often easier to read than autogenerated code. In fact, once I started using pgfplots instead of matlab, I found that both simpler to maintain (.tex files instead of .m files) and prettier. I dropped all of my matlab scripts eventually and used only pgfplots in the end.

b) if your vector graphics are too large, you may want to consider using bitmap graphics and use TeX to overlay axis descriptions over the bitmap. pgfplots comes with its \addplot graphics and \addplot3 graphics commands to streamline the process. You can also post feature requests to Nico Schloemer (author of matlab2tikz) - perhaps he is willing to add automatic bitmap conversion with overlay axes. Details for such an approach can be found in the aforementioned pgfplots manual (including application examples). Bitmap graphics have the advantage that they render much faster in all viewers - and for surface plots, it does not matter anyway.

[Tex/LaTex] How to convert structured LaTeX file to CSV data

I agree with the comments that you'd need a scripting language to extract the information from your existing documents, but once you have the information in a database you can use the datatool package to generate a report for a particular patient. For the type of information you want to save, I wouldn't recommend using csv as I would find that tricky to edit (but that depends on what you use to edit it). Here's how it might look in csv form (using | as a separator as you requested):

Patient Number|Name|Surname|DoB|Sex|Address|Phone|History|Past History|Personal History|Family History|Examination|Diagnosis|Treatment
0001|Joe|Bloggs|1974-12-06|Male|"1 The Street, The Town"|0123456|Joe's patient history here|Joe's past history here|Joe's personal history|Joe's family history|Joe's examination|\begin{enumerate}\item Joe's diagnosis\end{enumerate}|\begin{enumerate}\item Joe's treatment\end{enumerate}
0002|Jane|Doe|1970-05-18|Female|"2 The Street, The Town"|0123457|Mary's patient history here|Mary's past history here|Mary's personal history|Mary's family history|Mary's examination|\begin{enumerate}\item Mary's diagnosis\end{enumerate}|\begin{enumerate}\item Mary's treatment\end{enumerate}
0003|John|Smith|1969-01-20|Male|"3 The Street, The Town"|01234568|John's patient history here|John's past history here|John's personal history|John's family history|John's examination|\begin{enumerate}\item John's diagnosis\end{enumerate}|\begin{enumerate}\item John's treatment\end{enumerate}

Alternatively, the same information can be store in a .tex file using the datatool format:

\DTLnewdb{patients}
% Patient Joe Bloggs (patient number 0001)
\DTLnewrow*{patients}
\DTLnewdbentry*{patients}{Patient Number}{0001}
\DTLnewdbentry*{patients}{Name}{Joe}
\DTLnewdbentry*{patients}{Surname}{Bloggs}
\DTLnewdbentry*{patients}{DoB}{1974-12-06}
\DTLnewdbentry*{patients}{Sex}{Male}
\DTLnewdbentry*{patients}{Address}{1 The Street, The Town}
\DTLnewdbentry*{patients}{Phone}{0123456}
\DTLnewdbentry*{patients}{History}{Joe's patient history here}
\DTLnewdbentry*{patients}{Past History}{Joe's past history here}
\DTLnewdbentry*{patients}{Personal History}{Joe's personal history}
\DTLnewdbentry*{patients}{Family History}{Joe's family history}
\DTLnewdbentry*{patients}{Examination}{Joe's examination}
\DTLnewdbentry*{patients}{Diagnosis}
{%
  \begin{enumerate}
    \item Joe's diagnosis
  \end{enumerate}
}
\DTLnewdbentry*{patients}{Treatment}
{%
  \begin{enumerate}
    \item Joe's treatment
  \end{enumerate}
}
% Patient Jane Doe (patient number 0002)
\DTLnewrow*{patients}
\DTLnewdbentry*{patients}{Patient Number}{0002}
\DTLnewdbentry*{patients}{Name}{Jane}
\DTLnewdbentry*{patients}{Surname}{Doe}
\DTLnewdbentry*{patients}{DoB}{1970-05-18}
\DTLnewdbentry*{patients}{Sex}{Female}
\DTLnewdbentry*{patients}{Address}{2 The Street, The Town}
\DTLnewdbentry*{patients}{Phone}{0123457}
\DTLnewdbentry*{patients}{History}{Mary's patient history here}
\DTLnewdbentry*{patients}{Past History}{Mary's past history here}
\DTLnewdbentry*{patients}{Personal History}{Mary's personal history}
\DTLnewdbentry*{patients}{Family History}{Mary's family history}
\DTLnewdbentry*{patients}{Examination}{Mary's examination}
\DTLnewdbentry*{patients}{Diagnosis}
{%
  \begin{enumerate}
    \item Mary's diagnosis
  \end{enumerate}
}
\DTLnewdbentry*{patients}{Treatment}
{%
  \begin{enumerate}
    \item Mary's treatment
  \end{enumerate}
}
% Patient John Smith (patient number 0003)
\DTLnewrow*{patients}
\DTLnewdbentry*{patients}{Patient Number}{0003}
\DTLnewdbentry*{patients}{Name}{John}
\DTLnewdbentry*{patients}{Surname}{Smith}
\DTLnewdbentry*{patients}{DoB}{1969-01-20}
\DTLnewdbentry*{patients}{Sex}{Male}
\DTLnewdbentry*{patients}{Address}{3 The Street, The Town}
\DTLnewdbentry*{patients}{Phone}{0123458}
\DTLnewdbentry*{patients}{History}{John's patient history here}
\DTLnewdbentry*{patients}{Past History}{John's past history here}
\DTLnewdbentry*{patients}{Personal History}{John's personal history}
\DTLnewdbentry*{patients}{Family History}{John's family history}
\DTLnewdbentry*{patients}{Examination}{John's examination}
\DTLnewdbentry*{patients}{Diagnosis}
{%
  \begin{enumerate}
    \item John's diagnosis
  \end{enumerate}
}
\DTLnewdbentry*{patients}{Treatment}
{%
  \begin{enumerate}
    \item John's treatment
  \end{enumerate}
}

You can access a particular patient like this:

\documentclass{scrartcl}

\usepackage{datatool}

% load from csv file:
%\DTLsetseparator{|}
%\DTLloaddb{patients}{patients.csv}
% or load from .tex file:
\input{patients}

\title{Consultation Report}
\author{}

\newcommand*{\getdetails}[1]{%
  \dtlgetentryfromcurrentrow{\patientdetails}{\dtlcolumnindex{patients}{#1}}%
  \patientdetails
}

\begin{document}
\maketitle

% fetch patient's details (patient number 0002)
\dtlgetrowforvalue{patients}{\dtlcolumnindex{patients}{Patient Number}}{0002}%

\begin{tabular}{ll}
Name: & \getdetails{Name} \getdetails{Surname}\\
DoB: & \getdetails{DoB}\\
Sex: & \getdetails{Sex}\\
Address: & \getdetails{Address}\\
Phone: & \getdetails{Phone}
\end{tabular}

\section*{Clinical details}

\subsection*{Present history}

\getdetails{History}

\subsection*{Past History}

\getdetails{Past History}

\subsection*{Personal history}

\getdetails{Personal History}

\subsection*{Family history }

\getdetails{Family History}

\subsection*{Examination}

\getdetails{Examination}

\section*{Diagnosis}

\getdetails{Diagnosis}

\section*{Rx}

\getdetails{Treatment}

\begin{flushright}
My Name \\
{\footnotesize Qualification
}\\
{\footnotesize designation}\\
{\footnotesize Ph: }
\par\end{flushright}{\footnotesize \par}

\end{document}

Best Answer

Related Solutions

[Tex/LaTex] How to export matlab figures to (pdf) LaTeX

[Tex/LaTex] How to convert structured LaTeX file *to* CSV data

Related Question

[Tex/LaTex] How to convert structured LaTeX file to CSV data