[Tex/LaTex] Simpler boxplots in pgfplots – is this possible

boxplotpgfplots

When I wanted to plot boxplots in pgfplots, I started using the solution proposed by Jake in the question linked to above. However, as I wanted to plot more than one set of boxplots in the same figure (ie, using groupplots), the plot definition started getting quite unwieldly quite fast: with four addplots for the boxes alone, plus another for the outliers, I had 10+ addplot commands per figure.

What I'm after is a simpler way of doing this. Ideally, what I'd like would be a style definition that took care of all the same things that the addplot commands have been doing so far, so that I could do a simple

\addplot [boxplot] table {testdata.dat};

and be done with it, with the added possibility of passing some additional arguments to specify colour, marker type and such. I tried with a \newcommand which worked, but I couldn't manage to pass options to pgfplots, so I started looking for alternatives.

What I think I need for this is a similar approach to that used in scatter plots in pgfplots, like Jake's solution for this question, so that I could specify the coordinates for the box and the whiskers with a number of meta points. However, the solutions I've found all seem to use either 3 or maybe 4 numbers to specify each mark on the plot, and for boxplots I need 6.

I've managed to get quite close by enclosing equivalent TikZ commands to those written by Jake into a @pre marker code block, but I seem to be having scale problems despite using disabledatascaling and such, which as I understand should solve the issue. The problem here is that, unlike the last question I linked to, (I think) I cannot use \pgfplotspointmeta, since I'm trying to recover more than just one point. Then again, I might be barking at the wrong tree.

Here's a minimal (not) working example:

\documentclass{article}
\usepackage{pgfplots}

\pgfplotsset{
  boxplot/.style={
    mark=-,
    mark size=0.5em,
    scatter,
    point meta=0,
    only marks,
    axis equal,
    disabledatascaling,
    visualization depends on={\thisrow{boxtop}        \as \boxtop},
    visualization depends on={\thisrow{boxbottom}     \as \boxbottom},
    visualization depends on={\thisrow{whiskertop}    \as \whiskertop},
    visualization depends on={\thisrow{whiskerbottom} \as \whiskerbottom},
    visualization depends on={\thisrow{x}             \as \bpx},
    scatter/@pre marker code/.append code={
      \draw % box
        (axis cs:\bpx,\boxbottom)     -- ++ (  0.5em, 0pt) |-
        (axis cs:\bpx,\boxtop)        -- ++ ( -0.5em, 0pt) |-
        (axis cs:\bpx,\boxbottom)     -- cycle;
      \path % top whisker
        (axis cs:\bpx,\boxtop)        -- (axis cs:\bpx,\whiskertop);
      \path % bottom whisker
        (axis cs:\bpx,\boxbottom)     -- (axis cs:\bpx,\whiskerbottom);
      \path % top whisker marker
        (axis cs:\bpx,\whiskertop)    -- ++ ( 0.25em, 0pt) --
        (axis cs:\bpx,\whiskertop)    -- ++ (-0.25em, 0pt);
      \path % bottom whisker marker
        (axis cs:\bpx,\whiskerbottom) -- ++ ( 0.25em, 0pt) --
        (axis cs:\bpx,\whiskerbottom) -- ++ (-0.25em, 0pt);
    }
  }
}

\begin{document}
  \begin{figure}
    \begin{tikzpicture}
      \begin{axis}
        \addplot [boxplot] table[y=median] {
            x whiskerbottom boxbottom median boxtop whiskertop 
            1 42            45        47     47     48 
            2 36            39        40     41     43 
            3 41            44        45     46     47 
            4 20            29        31     36     38 
            5 31            32        34     36     39 
        };
      \end{axis}
    \end{tikzpicture}
  \end{figure}
\end{document}

The problem with this is, like I said above, that the TikZ commands don't seem to be using the same axis coordinate system (despite my attemps at the opposite by using axis cs; I actually get better results without it).

here

It seems as if the only thing I need is to make sure both pgfplots and TikZ are using the same scale, but I'm fresh out of ideas. Can this be done? Or is the approach just flawed, and I should navigate around the problem instead of finding a solution?

Best Answer

PGFPlots supports boxplots natively as of version 1.8 See Boxplot in LaTeX for an example.

The remainder of this answer should be considered obsolete.


You're right to ask about this, the current code is not very convenient to use (although it's proved surprisingly useful to me in the past nonetheless).

Your approach is very attractive in how much simpler the code is. However, when I first wrote the box plot stuff, I decided to go with several \addplot commands because that's the easiest way to get PGFPlots to take the box and whiskers into account when calculating the axis ranges.

I've modified my code to now provide a new command \boxplot[<optional keys>]{<data table>}. You can now also tell the command in which columns the different components of the box plots are, by setting box plot median index=<column index>, box plot whisker top index=<column index>, and so on. The box width is adjustable based on the question PGFplots and boxplots: How to tune width and separation of boxes?.

By default, only legend entry is created per box plot. If you want to avoid creating legend entries for the box plots entirely, you can add forget plot to the \boxplot options.

Using the following code (testdata1.dat is in my format, testdata2.dat in yours)

\begin{axis} [box plot width=2mm]
\boxplot [forget plot, red] {testdata.dat}
\boxplot [
    forget plot,
    box plot whisker bottom index=1,
    box plot whisker top index=5,
    box plot box bottom index=2,
    box plot box top index=4,
    box plot median index=3
] {testdata2.dat}
\addplot [domain=-2:6, thick, cyan] {-x+25+rnd}; \addlegendentry{Some line}
\end{axis}

you can now get


Complete code:

\documentclass{article}
\usepackage{pgfplots}
\usepackage{filecontents}

\begin{filecontents}{testdata.dat}
0 10 12 4 15 2
1 20 23 15 27 10
2 7 14 5 19 1
\end{filecontents}

\begin{filecontents}{testdata2.dat}
x whiskerbottom boxbottom median boxtop whiskertop 
1 42            45        47     47.5     48 
2 36            39        40     41     43 
3 41            44        45     46     47 
4 20            29        31     36     38 
5 31            32        34     36     39 
\end{filecontents}

\pgfplotsset{
    box plot/.style={
        /pgfplots/.cd,
        black,
        only marks,
        mark=-,
        mark size=\pgfkeysvalueof{/pgfplots/box plot width},
        /pgfplots/error bars/y dir=plus,
        /pgfplots/error bars/y explicit,
        /pgfplots/table/x index=\pgfkeysvalueof{/pgfplots/box plot x index},
    },
    box plot box/.style={
        /pgfplots/error bars/draw error bar/.code 2 args={%
            \draw  ##1 -- ++(\pgfkeysvalueof{/pgfplots/box plot width},0pt) |- ##2 -- ++(-\pgfkeysvalueof{/pgfplots/box plot width},0pt) |- ##1 -- cycle;
        },
        /pgfplots/table/.cd,
        y index=\pgfkeysvalueof{/pgfplots/box plot box top index},
        y error expr={
            \thisrowno{\pgfkeysvalueof{/pgfplots/box plot box bottom index}}
            - \thisrowno{\pgfkeysvalueof{/pgfplots/box plot box top index}}
        },
        /pgfplots/box plot
    },
    box plot top whisker/.style={
        /pgfplots/error bars/draw error bar/.code 2 args={%
            \pgfkeysgetvalue{/pgfplots/error bars/error mark}%
            {\pgfplotserrorbarsmark}%
            \pgfkeysgetvalue{/pgfplots/error bars/error mark options}%
            {\pgfplotserrorbarsmarkopts}%
            \path ##1 -- ##2;
        },
        /pgfplots/table/.cd,
        y index=\pgfkeysvalueof{/pgfplots/box plot whisker top index},
        y error expr={
            \thisrowno{\pgfkeysvalueof{/pgfplots/box plot box top index}}
            - \thisrowno{\pgfkeysvalueof{/pgfplots/box plot whisker top index}}
        },
        /pgfplots/box plot
    },
    box plot bottom whisker/.style={
        /pgfplots/error bars/draw error bar/.code 2 args={%
            \pgfkeysgetvalue{/pgfplots/error bars/error mark}%
            {\pgfplotserrorbarsmark}%
            \pgfkeysgetvalue{/pgfplots/error bars/error mark options}%
            {\pgfplotserrorbarsmarkopts}%
            \path ##1 -- ##2;
        },
        /pgfplots/table/.cd,
        y index=\pgfkeysvalueof{/pgfplots/box plot whisker bottom index},
        y error expr={
            \thisrowno{\pgfkeysvalueof{/pgfplots/box plot box bottom index}}
            - \thisrowno{\pgfkeysvalueof{/pgfplots/box plot whisker bottom index}}
        },
        /pgfplots/box plot
    },
    box plot median/.style={
        /pgfplots/box plot,
        /pgfplots/table/y index=\pgfkeysvalueof{/pgfplots/box plot median index}
    },
    box plot width/.initial=1em,
    box plot x index/.initial=0,
    box plot median index/.initial=1,
    box plot box top index/.initial=2,
    box plot box bottom index/.initial=3,
    box plot whisker top index/.initial=4,
    box plot whisker bottom index/.initial=5,
}

\newcommand{\boxplot}[2][]{
    \addplot [box plot median,#1] table {#2};
    \addplot [forget plot, box plot box,#1] table {#2};
    \addplot [forget plot, box plot top whisker,#1] table {#2};
    \addplot [forget plot, box plot bottom whisker,#1] table {#2};
}

\begin{document}
\begin{tikzpicture}
\begin{axis} [box plot width=2mm]
\boxplot [forget plot, red] {testdata.dat}
\boxplot [
    forget plot,
    box plot whisker bottom index=1,
    box plot whisker top index=5,
    box plot box bottom index=2,
    box plot box top index=4,
    box plot median index=3
] {testdata2.dat}
\addplot [domain=-2:6, thick, cyan] {-x+25+rnd}; \addlegendentry{Some line}
\end{axis}
\end{tikzpicture}
\end{document}