[Tex/LaTex] pgfplot: plotting a large dataset

luatexpdftexpgfplotspgfplotstabletikz-pgf

I try to plot a dataset which is large using pgfplots. Since I'm aware of problems with large files, I used the external mode. I additionally increased main_memory from 3000000 to 6000000. It crashes nevertheless, saying that pdflatex exceeded main memory. But if I look at the memory consumption of my system during compilation, I can't see any increase in memory usage. I couldn't find any explanation what main_memory takes, is it byte or Kbyte? If I increase it at a larger value, fmtutil-sys --all will fail and pdflatex won't work anymore. Can I get around this somehow?

\documentclass{article}

\usepackage{pgfplots}
\usepackage{pgfplotstable}

\usepgfplotslibrary{external} 
\tikzexternalize

\begin{document}

\begin{tikzpicture}
  \begin{axis}
    \addplot table[x expr=\coordindex, y index=0] {largefile};
  \end{axis}
\end{tikzpicture}

\end{document}

I also tried to compile with lualatex (just replaced pdflatex with lualatex) but this also seems to call pdflatex? At least I got this error message (same for lualatex and pdflatex):

! Package tikz Error: Sorry, the system call 'pdflatex -halt-on-error -interact
ion=batchmode -jobname "report-figure0" "\def\tikzexternalrealjob{report}\input
{report}"' did NOT result in a usable output file 'report-figure0' (expected on
e of .pdf:.jpg:.jpeg:.png:). Please verify that you have enabled system calls. 
For pdflatex, this is 'pdflatex -shell-escape'. Sometimes it is also named 'wri
te 18' or something like that. Or maybe the command simply failed? Error messag
es can be found in 'report-figure0.log'. If you continue now, I'll try to types
et the picture.

Excerpt from report-figure0.log:

! TeX capacity exceeded, sorry [main memory size=6000000].
\pgfplotsapplistXXpushback@smallbufoverfl ...toka 
                                                  \the \t@pgfplots@tokb \the...
l.13 ... expr=\coordindex, y index=0] {largefile};
                                                  ^^M 

largefile is 5.7 Mb is size and has 593932 datapoints. I barely dare to say that I also have a file of 163 Mb in size with 18928305 datapoints. I thought this is no problem because gnuplot can handle those files fast without problems.

head of largefile:

9409252
17298051
21351017
24466010
26952485
29389696
31442872
33345635
35029538
36710432

Please find the dataset here.

I'm using texlive2012:

$ pdflatex
This is pdfTeX, Version 3.1415926-2.4-1.40.13 (TeX Live 2012)
 restricted \write18 enabled

$ lualatex
This is LuaTeX, Version beta-0.70.2-2012052410 (TeX Live 2012)
 restricted \write18 enabled.

Best Answer

That's one very densely sampled dataset! As Paul Gaborit pointed out in the comment, you'll want to downsample this first, especially if the data is this smooth. Plotting every single data point will blow up the file size and decrease rendering performance without adding any value to the plot.

You can use gnuplot within PGFPlots to downsample the data. If you only plot every 1000th point, the plot is indistinguishable from the full data, but you can stick with pdflatex instead of having to use lualatex.

\documentclass{article}

\usepackage{pgfplots}
\usepackage{pgfplotstable}

\usepgfplotslibrary{external} 
\tikzexternalize
\pgfplotsset{compat=newest}

\begin{document}

\begin{tikzpicture}
  \begin{axis}
    \addplot [no markers] gnuplot [raw gnuplot] {
        plot "largefile.csv" using ($0*1000):1 every 1000; % $0 is the dummy column for the coordinate index
    };
  \end{axis}
\end{tikzpicture}

\end{document}

If you need non-uniform sampling, you can just concatenate several plot commands:

\documentclass{article}

\usepackage{pgfplots}
\usepackage{pgfplotstable}

\usepgfplotslibrary{external} 
\tikzexternalize
\pgfplotsset{compat=newest}

\begin{document}

\begin{tikzpicture}
  \begin{axis}
    \addplot [no markers] gnuplot [raw gnuplot] {
        plot "largefile.csv" using ($0):1 every 1::1::300;
        plot "largefile.csv" using ($0*100+300):1 every 100::300::6000;
        plot "largefile.csv" using ($0*1000+6000):1 every 1000::6000
    };
  \end{axis}
\end{tikzpicture}

\end{document}