[Tex/LaTex] Filter rows from a table

pgfplotspgfplotstable

I have a large table that contains results for many instances, and I want to make separate tables and plots grouped by one (or more) field. In this minimal working example, I want to make a table and a plot for all 20 houses instances, and another table and plot for all 200 houses instances.

The best thing would be to obtain two subtables from the original one, keeping only the rows with houses=20, or houses=200 (as I would get if I could separate data in table file). I would prefer to avoid to explicitly filter each time I typeset a table or a plot, or at least it should take minimal code to do so (perhaps defining a custom style?).

\documentclass{article}

\usepackage{pgfplots}
\pgfplotsset{compat=newest}

\usepackage{pgfplotstable}

\begin{document}

\pgfplotstableread{houses  instance    value
20      1           8919
20      2           8965
20      3           8901
20      4           8816
20      5           8875
20      6           9027
20      7           8915
20      8           8907
20      9           8832
20      10          8934
200     1           84714
200     3           85630
200     4           84748
200     5           84565
200     6           85109
200     7           84588
200     8           84638
200     9           84673
200     10          85170
}{\fulltable}


Table for all instances

\pgfplotstabletypeset{\fulltable}

\begin{tikzpicture}
  \begin{axis}[
      title={Values for all instances},
      xlabel={Instance},
      ylabel={Value}]
    \addplot+[scatter, only marks] table[x=instance, y=value]
      {\fulltable};
  \end{axis}
\end{tikzpicture}

\newpage

Table for 20 houses instances

???

Plot for 20 houses instances

???

\end{document}

Edit

By looking at similar questions, I came up with this partial solution:

%\filtertable{table}{field}{value}{#1}
\newcommand{\filtertable}[4]{
\pgfplotstablegetelem{#4}{#2}\of{#1}
\ifnum\pgfplotsretval=#3\relax
\else\pgfplotstableuserowfalse\fi
}

%\filtertableplot{table}{field}{value}
\newcommand{\filtertableplot}[3]{
\pgfplotstablegetelem{\coordindex}{#2}\of{#1}
\ifnum\pgfplotsretval=#3
\else
\def\pgfmathresult{}
\fi
}

Table for 200 houses instances

\pgfplotstabletypeset[
    columns={instance,value},
    row predicate/.code={\filtertable{\fulltable}{houses}{200}{#1}}
  ]
  {\fulltable}

\begin{tikzpicture}
  \begin{axis}[
      title={Values for 200 houses instances},
      xlabel={Instance},
      ylabel={Value},
      x filter/.code={\filtertableplot{\fulltable}{houses}{200}{#1}}]
    \addplot+[scatter, only marks] table[x=instance, y=value]
      {\fulltable};
  \end{axis}
\end{tikzpicture}

Is there a way to avoid repeating the table name, to avoid explicitly using #1, and to wrap the x filter=... and row predicate=... in styles?

Best Answer

You can filter the values to be plotted using the discard if not={<column name>}{<value>} key from Is it possible to change the color of a single bar when the bar plot is based on symbolic values?. This allows you to type

\addplot+[only marks, discard if not={houses}{20}] table[x=instance, y=value]
      {fulltable.dat};

to plot only those entries where the houses value is 20. For this, you need to put the following code chunk into your preamble, and plot the data from a file (not from a table created using \pgfplotstableread).

\pgfplotsset{
    discard if not/.style 2 args={
        x filter/.code={
            \edef\tempa{\thisrow{#1}}
            \edef\tempb{#2}
            \ifx\tempa\tempb
            \else
                \def\pgfmathresult{inf}
            \fi
        }
    }
}

For the table, you can do something similar (although the code is a bit trickier). It allows you to type

\pgfplotstabletypeset[discard if not={houses}{20}]{fulltable.dat}

to filter the rows.

\makeatletter
\pgfplotstableset{
    discard if not/.style 2 args={
        row predicate/.code={
            \def\pgfplotstable@loc@TMPd{\pgfplotstablegetelem{##1}{#1}\of}
            \expandafter\pgfplotstable@loc@TMPd\pgfplotstablename
            \edef\tempa{\pgfplotsretval}
            \edef\tempb{#2}
            \ifx\tempa\tempb
            \else
                \pgfplotstableuserowfalse
            \fi
        }
    }
}
\makeatother


\documentclass{article}

\usepackage{pgfplots}
\pgfplotsset{compat=newest}

\usepackage{pgfplotstable}
\usepackage{filecontents}

\begin{document}

\pgfplotstableread{
houses  instance    value
20      1           8919
20      2           8965
20      3           8901
20      4           8816
20      5           8875
20      6           9027
20      7           8915
20      8           8907
20      9           8832
20      10          8934
200     1           84714
200     3           85630
200     4           84748
200     5           84565
200     6           85109
200     7           84588
200     8           84638
200     9           84673
200     10          85170
}{\fulltable}

\begin{filecontents}{fulltable.dat}
houses  instance    value
20      1           8919
20      2           8965
20      3           8901
20      4           8816
20      5           8875
20      6           9027
20      7           8915
20      8           8907
20      9           8832
20      10          8934
200     1           84714
200     3           85630
200     4           84748
200     5           84565
200     6           85109
200     7           84588
200     8           84638
200     9           84673
200     10          85170
\end{filecontents}

\pgfplotsset{
    discard if not/.style 2 args={
        x filter/.code={
            \edef\tempa{\thisrow{#1}}
            \edef\tempb{#2}
            \ifx\tempa\tempb
            \else
                \def\pgfmathresult{inf}
            \fi
        }
    }
}

\makeatletter
\pgfplotstableset{
    discard if not/.style 2 args={
        row predicate/.code={
            \def\pgfplotstable@loc@TMPd{\pgfplotstablegetelem{##1}{#1}\of}
            \expandafter\pgfplotstable@loc@TMPd\pgfplotstablename
            \edef\tempa{\pgfplotsretval}
            \edef\tempb{#2}
            \ifx\tempa\tempb
            \else
                \pgfplotstableuserowfalse
            \fi
        }
    }
}
\makeatother

\centering
{\bfseries Table for 20 houses instances:}

\pgfplotstabletypeset[discard if not={houses}{20}]{fulltable.dat}

\begin{tikzpicture}[trim axis left]
  \begin{axis}[
      title={{\bfseries Plot for 20 houses instances}},
      xlabel={Instance},
      ylabel={Value}]
    \addplot+[only marks, discard if not={houses}{20}] table[x=instance, y=value]
      {fulltable.dat};
  \end{axis}
\end{tikzpicture}


\end{document}