[Tex/LaTex] Automatic calculation of error in pgfplots

automationcalculationspgfplots

When taking in data through a file (such as table[x index=0, y index=1, y error index=2]{plots/mydata.table};) is it possible to automatically calculate the error bars (instead of putting them in manually)? Error calculation is a rather straightforward manner, and I would think that a package as complete as pgfplots would include it as a feature.

I'm open to a number of options. Obviously, a pure LaTeX (i.e. pgfplots) solution is best. If there is a way to automate the run of a script upon compilation that alters the data files before their use, that would be okay too (and this is probably the easiest, although I don't know how to effect the automation).

MWE

\documentclass{article}
\usepackage{pgfplots}
\usepackage{tikz}

\pgfplotsset{compat=1.7}

\begin{document}
\begin{tikzpicture}
  \begin{axis}[grid=major]
    \addplot+[smooth,
    error bars/.cd,
    y dir=both,
    y explicit]
    table[x index=0, y index=1, y error index=2]
    {plots/mydata.table}; % simple space-delimited data file
  \end{axis}
\end{tikzpicture}
\end{document}

Sample mydata.table

# Test input file
Sample Measure1 Measure2 Measure3 Measure4 ...
5 180 190 200 210
15 420 410 400 390
25 650 640 630 640
35 1100 1200 1150 1020

I have a script that will produce

5 194.4 4.36898157469
10 195.6 1.4310835056
15 207.4 2.23785611691
20 250.4 1.4587666023

given the input file.

Best Answer

PGFPlots comes with the PGFPlotstable package, which can process tabulated data. It doesn't include functions to calculate summary statistics like the mean, standard deviation or standard error for data columns, but these can be added quite easily.

After the necessary code has been included in the document, you can tell PGFPlots to make the standard error of the data in columns 2 to 5 available in a column called stderror by putting the following lines somewhere before your graph:

\pgfplotstableset{
    summary statistics/end index=5,
    create on use/stderror/.style={create col/standard error}
}

By default, the code assumes that the data columns start at index 1 (so the second column in the table) and end at column 4, but this can be changed using the keys summary statistics/start index and summary statistics/end index.

Then you can plot the mean values of each row with the error bars representing the standard error of columns 2 to 5 using

  \begin{axis}[grid=major]
    \addplot+[
        smooth,
        error bars/.cd,
            y dir=both,
            y explicit
    ]
    table[
            x=Sample,
            y=mean,
            y error=stderror
    ]
    {data.txt};
  \end{axis}

Here's an example using the data you provided (with the values slightly altered for a more dramatic effect):

\documentclass{article}
\usepackage{pgfplots, pgfplotstable}
\usepackage{filecontents}


\begin{filecontents*}{data.txt}
Sample Measure1 Measure2 Measure3 Measure4
5 80 190 200 210
15 520 410 430 350
25 650 640 630 900
35 1100 1200 1150 1020
\end{filecontents*}

\pgfplotsset{compat=1.7}


%% Code chunk for statistics starts here...
\newcommand{\calcrowmean}{
    \def\rowmean{0}
    \pgfmathparse{\pgfkeysvalueof{/pgfplots/table/summary statistics/end index}-\pgfkeysvalueof{/pgfplots/table/summary statistics/start index}+1}
    \edef\numberofcols{\pgfmathresult}
            % ... loop over all columns, summing up the elements
    \pgfplotsforeachungrouped \col in {\pgfkeysvalueof{/pgfplots/table/summary statistics/start index},...,\pgfkeysvalueof{/pgfplots/table/summary statistics/end index}}{
        \pgfmathparse{\rowmean+\thisrowno{\col}/\numberofcols}
        \edef\rowmean{\pgfmathresult}
    }
}
\newcommand{\calcstddev}{
    \def\rowstddev{0}
    \calcrowmean
    \pgfplotsforeachungrouped \col in {\pgfkeysvalueof{/pgfplots/table/summary statistics/start index},...,\pgfkeysvalueof{/pgfplots/table/summary statistics/end index}}{
        \pgfmathparse{\rowstddev+(\thisrowno{\col}-\rowmean)^2/(\numberofcols-1)}
        \edef\rowstddev{\pgfmathresult}
    }
    \pgfmathparse{sqrt(\rowstddev)}
}
\newcommand{\calcstderror}{
    \calcrowmean
    \calcstddev
    \pgfmathparse{sqrt(\rowstddev)/sqrt(\numberofcols)}
}

\pgfplotstableset{
    summary statistics/start index/.initial=1,
    summary statistics/end index/.initial=4,
    create col/mean/.style={
        /pgfplots/table/create col/assign/.code={% In each row ... 
            \calcrowmean
            \pgfkeyslet{/pgfplots/table/create col/next content}\rowmean
        }
    },
    create col/standard deviation/.style={
        /pgfplots/table/create col/assign/.code={% In each row ... 
            \calcstddev
            \pgfkeyslet{/pgfplots/table/create col/next content}\pgfmathresult
        }
    },
    create col/standard error/.style={
        create col/assign/.code={% In each row ... 
            \calcstderror
            \pgfkeyslet{/pgfplots/table/create col/next content}\pgfmathresult
        }
    }
}
%%...code chunk for statistics ends here

\begin{document}
\pgfplotstableset{
    create on use/mean/.style={create col/mean},
    create on use/stddev/.style={create col/standard deviation},
    create on use/stderror/.style={create col/standard error}
}
\pgfkeys{/pgf/fpu=true} % Only needed for \pgfplotstabletypeset
\pgfplotstabletypeset[columns={Sample, Measure1, Measure2, Measure3, Measure4, mean, stddev, stderror}]{data.txt}
\pgfkeys{/pgf/fpu=false}

\begin{tikzpicture}
  \begin{axis}[grid=major]
    \addplot+[
            smooth,
            error bars/.cd,
        y dir=both,
        y explicit
    ]
    table[
            x=Sample,
            y=mean,
            y error=stderror
    ]
    {data.txt};
  \end{axis}
\end{tikzpicture}
\end{document}
Related Question