[Tex/LaTex] Create boxplots from file

boxplotpgfplotstikz-pgf

My question is somewhat related to this thread. I would like
to create box plots from a file. The file (in my case a CSV) looks like this:

Upper,Lower,Min,Max,Median,Name
3,1,0,4,2,First
4,2,1,5,3,Second
...

There are multiple rows, each of which contains fields describing the data points for the box plot (median, position of the whiskers and the box boundaries) and the name of the data set. I would like to create a number of box plots from this data. The name should be added as a legend entry.

I have tried using a CSV reader:

\begin{tikzpicture}
  \begin{axis}[
    \csvreader[
      head to column names,
      separator=comma]
              {data.csv}{}% use head of csv as column names
              {
                \addplot+ [
                  boxplot prepared={
                    lower whisker=\Min,
                    lower quartile=\Lower,
                    median=\Median,
                    upper quartile=\Upper,
                    upper whisker=\Max,
                  },
                ] coordinates {};
              }
  \end{axis}
\end{tikzpicture}

This produces an empty plot though. My guess is that the macros (\Min and so on) do not work properly in the tikzpicture / axis environment?! (It works when I just dump out the values outside of the picture).

I also took a look at the thread mentioned above. The problem in this case is that only the first row of the table is plotted whereas I would like to plot all of them. This shortcoming is mentioned in the comments, but no extension of the approach is mentioned.

What is the best way to add the plots?

Best Answer

You can extend Jake's answer to Read boxplot prepared values from a table fairly easily, using \pgfplotsinvokeforeach.

The code below is based on Jake's answer, with just a few modifications:

  • a way of getting the number of rows from a table:

    \pgfplotstablegetrowsof{\datatable}
    \pgfmathtruncatemacro\TotalRows{\pgfplotsretval-1}
    

    Subtract one, because row/column numbers start counting from 0 in indexing.

  • a loop instead of two different \addplots:

    \pgfplotsinvokeforeach{0,...,\TotalRows}{ \addplot .. }
    

    Note the addition of row=#1 in the boxplot options, in \pgfplotsinvokeforeach the loop variable is represented by #1.

  • added area legend to the \addplot options, otherwise the legend image is a complete (large) box plot.

  • added legend entry from table:

    \pgfplotstablegetelem{#1}{name}\of\datatable
    \addlegendentryexpanded{\pgfplotsretval}
    

output of code

\documentclass[crop=false]{standalone}
\usepackage{pgfplotstable}
\pgfplotsset{compat=1.8}
\usepgfplotslibrary{statistics}
\makeatletter
\pgfplotsset{
    boxplot prepared from table/.code={
        \def\tikz@plot@handler{\pgfplotsplothandlerboxplotprepared}%
        \pgfplotsset{
            /pgfplots/boxplot prepared from table/.cd,
            #1,
        }
    },
    /pgfplots/boxplot prepared from table/.cd,
        table/.code={\pgfplotstablecopy{#1}\to\boxplot@datatable},
        row/.initial=0,
        make style readable from table/.style={
            #1/.code={
                \pgfplotstablegetelem{\pgfkeysvalueof{/pgfplots/boxplot prepared from table/row}}{##1}\of\boxplot@datatable
                \pgfplotsset{boxplot/#1/.expand once={\pgfplotsretval}}
            }
        },
        make style readable from table=lower whisker,
        make style readable from table=upper whisker,
        make style readable from table=lower quartile,
        make style readable from table=upper quartile,
        make style readable from table=median,
        make style readable from table=lower notch,
        make style readable from table=upper notch
}
\makeatother


\pgfplotstableread{
    lw lq med  uq uw name
     5  7 8.5 9.5 10 first
     4  5 6.5 8.5 9.5 second
}\datatable


\begin{document}
\begin{tikzpicture}
\begin{axis}[boxplot/draw direction=y]
\pgfplotstablegetrowsof{\datatable}
\pgfmathtruncatemacro\TotalRows{\pgfplotsretval-1}
\pgfplotsinvokeforeach{0,...,\TotalRows}
{
  \addplot+[
  boxplot prepared from table={
    table=\datatable,
    row=#1,
    lower whisker=lw,
    upper whisker=uw,
    lower quartile=lq,
    upper quartile=uq,
    median=med
  },
  boxplot prepared,
  % to get a more useful legend
  area legend
  ]
  coordinates {};

  % add legend entry 
  \pgfplotstablegetelem{#1}{name}\of\datatable
  \addlegendentryexpanded{\pgfplotsretval}
}
\end{axis}
\end{tikzpicture}
\end{document}
Related Question