[Tex/LaTex] Plotting experimental data using data files with 5 thousand lines

pgfplotsworkflow

I want to illustrate various experimental data in my thesis e.g.:

  • X-ray powder diffraction data (thousand lines)
  • FTIR data (5 thousand lines)
  • TG-/DTA data (5 thousand lines)

and I am wondering how to get a decent and convenient workflow with satisfying results.

I was playing with pgfplots and already be able to create a diagram for my XRD data (1k points) by expanding TeX memory size. But I failed to create a diagram for my FTIR experiments (5k points). Although I really want to present my diagrams in a consistent and qualitative manner it seems to me using pgfplots is somehow not really convenient for this kind of task?!

  • In the case of pgfplots. How can I achieve to plot FTIR spectra with more than 5k points? Is it convenient to use pgfplots for this kind of task and if not…
  • What would you recommend best for good workflow in combination with a considerable result for a lot of diagrams in my thesis?

My workflow so far:
For preparation, I used qtiplot to normalize and transform plot data to show various plots in comparison in one diagram. After that I exported this manipulated data table to a .txt file.

\documentclass[a4paper,10pt]{article}
\usepackage[latin9]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{pgfplots}
\pgfplotsset{compat=newest}
%opening
\title{XRD pattern with pgfplots}
\author{Max Muster}

\begin{document}
\maketitle
\section{section with pgfplots-test}
\begin{tikzpicture}
  \axis[
x dir       = reverse,
xlabel      = pgftemplate,
xmin        = 350,
xmax        = 4000,
yticklabels =
]
  \pgfplotstableread{overview.txt}
    \datatable
\addplot[color=black,mark=none] table[y = 1st plot, header = false] from \datatable ;
\addplot[color=black,mark=none] table[y = 2nd plot] from \datatable ;
\addplot[color=black,mark=none] table[y = 3rd plot] from \datatable ;
\addplot[color=black,mark=none] table[y = 4th plot] from \datatable ;
\addplot[color=black,mark=none] table[y = 5th plot] from \datatable ;
  \endaxis
\end{tikzpicture}

\end{document}

Best Answer

This answer is referring to your wish to keep these graphs consistent with the rest of your thesis.

I believe that pgfplots can do this kind of stuff, although it takes longer than R.

However, you can easily gain lots of speed if you change your input method: replace \addplot table {\datatable}; by \addplot table {overview.txt}; and your time and probably mem usage will go down. At least the time consumption will reduce dramatically (see below for explanations)

I have just generated a couple of dummy plots to see if it works. I believe they resemble your use-case from a scalability point of view.

\documentclass{standalone}

\usepackage{pgfplots}

\pgfplotsset{compat=1.8}

\begin{document}

    \begin{tikzpicture}
    \begin{axis}
    \addplot[color=red,mark=none,samples=5000,id=1] gnuplot {rand(0)};
    \addplot[color=green,mark=none,samples=5000,id=2] gnuplot {rand(0)};
    \addplot[color=black,mark=none,samples=5000,id=3] gnuplot {rand(0)};
    \addplot[color=blue,mark=none,samples=5000,id=4] gnuplot {rand(0)};
    \addplot[color=orange,mark=none,samples=5000,id=5] gnuplot {rand(0)};
    \end{axis}
    \end{tikzpicture}
\end{document}

This attempt used

Here is how much of TeX's memory you used:
 20614 strings out of 495035
 531998 string characters out of 3781519
 10114824 words of memory out of 15069104
 23486 multiletter control sequences out of 15000+200000
 3640 words of font info for 14 fonts, out of 8000000 for 9000
 14 hyphenation exceptions out of 8191
 62i,10n,76p,693b,1768s stack positions out of 30000i,500n,10000p,200000b,80000s

afterwards, I re-read the temporary files generated by gnuplot as follows:

\documentclass{standalone}

\usepackage{pgfplots}

\pgfplotsset{compat=1.8}

\begin{document}

\begin{tikzpicture}
\begin{axis}
    \addplot[color=red,mark=none] table {P.1.table};
    \addplot[color=green,mark=none] table {P.2.table};
    \addplot[color=black,mark=none] table {P.3.table};
    \addplot[color=blue,mark=none] table {P.4.table};
    \addplot[color=orange,mark=none] table {P.5.table};
\end{axis}
\end{tikzpicture}

\end{document}

Here is how much of TeX's memory you used:
 20571 strings out of 495035
 531135 string characters out of 3781519
 10112441 words of memory out of 15066721
 23441 multiletter control sequences out of 15000+200000
 3640 words of font info for 14 fonts, out of 8000000 for 9000
 14 hyphenation exceptions out of 8191
 62i,10n,76p,694b,1762s stack positions out of 30000i,500n,10000p,200000b,80000s

It took a couple of seconds to generate these plots and my numbers give hints on the required memory settings. Note that I have used pdflatex here. lualatex will be even simpler as lualatex allocates memory dynamically and does not need cumbersome configuration changes.

If you need to do this all the time, you may want to consider alternative solutions.

If you need to do this occasionally (namely whenever you regenerate your data files), this is ok if you use the external library: in this case, the system will automatically compile the images into separate pdfs and will include the pdfs. In the optimal case, you use

\usetikzlibrary{external}
\tikzexternalize[mode=list and make]

as this will auto-detect if your input files change and will recompile if and only if needed. Use this if you are familiar with make. Let us know if you need help in this kind of setup.

The reason why \addplot table {<filename>} is much faster than \addplot table{<\loadedtable>}; is highly unexpected, and it resembles one of the most serious weaknesses in TeX: TeX has neither efficient arrays nor efficient lists. In fact, \addplot table {<\loadedtable>}; has quadratic runtime in the number of data points whereas \addplot table {<filename>}; is linear up to about 100000 data points. I am unsure but I believe memory also makes a difference (but only with respect to a factor).