[Tex/LaTex] Unmodified boxplot with pgfplots

boxplotpgfplots

When using pgfplots to create a boxplot, it'll draw a modified box plot by default, where the whiskers are the largest/smallest non-outlier values and the outliers are represented by dots. Is there a way to set them to simply the largest/smallest values without having to specify them explicitly with boxplot prepared?

MWE:

\documentclass[12pt]{article}

\usepackage{pgfplots}
\usepackage{filecontents}

\usepgfplotslibrary{statistics}

\begin{filecontents}{data.txt}
  497
  222
  849
  480
  477
  662
  1005
  460
  1131
  540
  506
  867
  706
  1558
  954
\end{filecontents}

\begin{document}
  \begin{tikzpicture}
    \begin{axis}[width=\linewidth, height=4cm, enlarge y limits=.2,
                 ytick={1, 2}, yticklabels={Wrong, Right}]
      \addplot+ [boxplot] table [y index=0] {data.txt};
      % Ignore the miscalculations here
      \addplot+ [boxplot prepared={draw position=2, 
                                   median=662,
                                   lower whisker=222,
                                   upper whisker=1558,
                                   upper quartile=867, 
                                   lower quartile=478.5}] coordinates {};
    \end{axis}
  \end{tikzpicture}
\end{document}

Best Answer

An answer is to set whisker range, which determines which points are considered outliers, to a very high value.

\documentclass[12pt]{article}

\usepackage{pgfplots}
\usepackage{filecontents}

\pgfplotsset{width=\textwidth, compat=1.12}
\usepgfplotslibrary{statistics}

%% A value larger than the ratio between any quartile range of any
%% boxplot in the document.
\newcommand\boxplotbignum{1000000}

\begin{filecontents}{data.txt}
  497
  222
  849
  480
  477
  662
  1005
  460
  1131
  540
  506
  867
  706
  1558
  954
\end{filecontents}

\begin{document}
  \begin{tikzpicture}
    \begin{axis}[width=\linewidth, height=4cm, enlarge y limits=.2,
                 ytick={1, 2}, yticklabels={Wrong, Right}]
      \addplot+ [boxplot={whisker range=\boxplotbignum}] table [y index=0] {data.txt};
      % Ignore the miscalculations here
      \addplot+ [boxplot prepared={draw position=2, 
                                   median=662,
                                   lower whisker=222,
                                   upper whisker=1558,
                                   upper quartile=867, 
                                   lower quartile=478.5}] coordinates {};
    \end{axis}
  \end{tikzpicture}
\end{document}

Related Solutions

[Tex/LaTex] use pgfplots to make a boxplot

PGFPlots supports boxplots natively as of version 1.8 See Boxplot in LaTeX for an example.

The remainder of this answer should be considered obsolete.

There is a much improved version of this code at Simpler boxplots in pgfplots - is this possible?. It allows creating box plots with a single command, and adds much more flexibility to the data format and the plot styles:

Original answer:

Not out of the box, and you'd have to do the quantile calculations outside of PGFplots, but then you can draw box plots with a bit of style trickery.

This code

\begin{axis} [enlarge x limits=0.5,xtick=data]
    \addplot [box plot median] table {testdata.dat};
    \addplot [box plot box] table {testdata.dat};
    \addplot [box plot top whisker] table {testdata.dat};
    \addplot [box plot bottom whisker] table {testdata.dat};
\end{axis}

can generate this plot

box plots with pgfplots

if testdata.dat is of the form

index median box_top box_bottom whisker_top whisker_bottom

Here's a full compilable example:

\documentclass{article}
\usepackage{pgfplots}
\usepackage{filecontents}

\begin{filecontents}{testdata.dat}
0 1 1.2 0.4 1.5 0.2
1 2 2.3 1.5 2.7 1
2 0.7 1.4 0.5 1.9 0.1
\end{filecontents}

\pgfplotsset{
    box plot/.style={
        /pgfplots/.cd,
        black,
        only marks,
        mark=-,
        mark size=1em,
        /pgfplots/error bars/.cd,
        y dir=plus,
        y explicit,
    },
    box plot box/.style={
        /pgfplots/error bars/draw error bar/.code 2 args={%
            \draw  ##1 -- ++(1em,0pt) |- ##2 -- ++(-1em,0pt) |- ##1 -- cycle;
        },
        /pgfplots/table/.cd,
        y index=2,
        y error expr={\thisrowno{3}-\thisrowno{2}},
        /pgfplots/box plot
    },
    box plot top whisker/.style={
        /pgfplots/error bars/draw error bar/.code 2 args={%
            \pgfkeysgetvalue{/pgfplots/error bars/error mark}%
            {\pgfplotserrorbarsmark}%
            \pgfkeysgetvalue{/pgfplots/error bars/error mark options}%
            {\pgfplotserrorbarsmarkopts}%
            \path ##1 -- ##2;
        },
        /pgfplots/table/.cd,
        y index=4,
        y error expr={\thisrowno{2}-\thisrowno{4}},
        /pgfplots/box plot
    },
    box plot bottom whisker/.style={
        /pgfplots/error bars/draw error bar/.code 2 args={%
            \pgfkeysgetvalue{/pgfplots/error bars/error mark}%
            {\pgfplotserrorbarsmark}%
            \pgfkeysgetvalue{/pgfplots/error bars/error mark options}%
            {\pgfplotserrorbarsmarkopts}%
            \path ##1 -- ##2;
        },
        /pgfplots/table/.cd,
        y index=5,
        y error expr={\thisrowno{3}-\thisrowno{5}},
        /pgfplots/box plot
    },
    box plot median/.style={
        /pgfplots/box plot
    }
}

\begin{document}
\begin{tikzpicture}
\begin{axis} [enlarge x limits=0.5,xtick=data]
    \addplot [box plot median] table {testdata.dat};
    \addplot [box plot box] table {testdata.dat};
    \addplot [box plot top whisker] table {testdata.dat};
    \addplot [box plot bottom whisker] table {testdata.dat};
\end{axis}
\end{tikzpicture}
\end{document}

[Tex/LaTex] plotting two time series with bounds

This happens because PGFPlots only uses one "stack" per axis: You're stacking the second confidence interval on top of the first. The easiest way to fix this is probably to use the approach described in "Is there an easy way of using line thickness as error indicator in a plot?": After plotting the first confidence interval, stack the upper bound on top again, using stack dir=minus. That way, the stack will be reset to zero, and you can draw the second confidence interval in the same fashion as the first:

\documentclass{standalone}
\usepackage{pgfplots, tikz}

\usepackage{pgfplotstable}

\pgfplotstableread{
temps   y_h y_h__inf    y_h__sup    y_f y_f__inf    y_f__sup    

1   0.237340    0.135170    0.339511    0.237653    0.135482    0.339823    
2   0.561320    0.422007    0.700633    0.165871    0.026558    0.305184    
3   0.694760    0.534205    0.855314    0.074856    -0.085698   0.235411    
4   0.728306    0.560179    0.896432    0.003361    -0.164765   0.171487    
5   0.711710    0.544944    0.878477    -0.044582   -0.211349   0.122184    
6   0.671241    0.511191    0.831291    -0.073347   -0.233397   0.086703    
7   0.621177    0.471219    0.771135    -0.088418   -0.238376   0.061540    
8   0.569354    0.431826    0.706882    -0.094382   -0.231910   0.043146    
9   0.519973    0.396571    0.643376    -0.094619   -0.218022   0.028783    
10  0.475121    0.366990    0.583251    -0.091467   -0.199598   0.016664    
}{\table}

\begin{document}
\begin{tikzpicture}

    \begin{axis}
    % y_h confidence interval
    \addplot [stack plots=y, fill=none, draw=none, forget plot]   table [x=temps, y=y_h__inf]   {\table} \closedcycle;
    \addplot [stack plots=y, fill=gray!50, opacity=0.4, draw opacity=0, area legend]   table [x=temps, y expr=\thisrow{y_h__sup}-\thisrow{y_h__inf}]   {\table} \closedcycle;
    % subtract the upper bound so our stack is back at zero
    \addplot [stack plots=y, stack dir=minus, forget plot, draw=none] table [x=temps, y=y_h__sup] {\table};

    % y_f confidence interval
    \addplot [stack plots=y, fill=none, draw=none, forget plot]   table [x=temps, y=y_f__inf]   {\table} \closedcycle;
    \addplot [stack plots=y, fill=gray!50, opacity=0.4, draw opacity=0, area legend]   table [x=temps, y expr=\thisrow{y_f__sup}-\thisrow{y_f__inf}]   {\table} \closedcycle;

    % the line plots (y_h and y_f)    
    \addplot [stack plots=false, very thick,smooth,blue]  table [x=temps, y=y_h]   {\table};
    \addplot [stack plots=false, very thick,smooth,blue]  table [x=temps, y=y_f]   {\table};
    \end{axis}

\end{tikzpicture}
\end{document}

Best Answer

Related Solutions

[Tex/LaTex] use pgfplots to make a boxplot

[Tex/LaTex] plotting two time series with bounds

Related Question