[Tex/LaTex] PGFPlots: calculate a linear regression, ignoring some data

pgfplots

I know that I can use PGFPlots to calculate a linear regression with the create col/linear regression command.

I have a set of data that is curved for the first few points, but linear for the remainder of the data set. I want to find a linear regression but ignore the first n points from my data file in my regression. Is this possible with PGFPlots?

A contrived example of this data would be:

x     y
0     0
1     15
2     25
3     28
4     30
5     31
6     32
7     33
8     34
9     35
10    36

where the curve is linear starting at x = 4. A plot of this looks like:

Sample plot

One (not great) solution I can think of is to make a new data set without the data points and run my regression on this new set.

Best Answer

The first lines of the table can be ignored by option skip first n. The first plot of the following example shows the calculated regression line for the calculated area between point 5 and the last point. The second plot draws the line over the full range using the calculated parameters of the regression line.

\begin{filecontents*}{\jobname-plot.dat}
x     y
0     0
1     15
2     25
3     28
4     30
5     31
6     32
7     33
8     34
9     35
10    36
\end{filecontents*}

\documentclass{article}
\usepackage{pgfplots}
\pgfplotsset{compat=1.12}
\usepackage{pgfplotstable}

\begin{document}
  \begin{tikzpicture}
    \begin{axis}
      \addplot[only marks, mark=*, blue]
        table {\jobname-plot.dat};
      \addplot[]
        table[header=false,skip first n=5,
          y={create col/linear regression},
        ] {\jobname-plot.dat};
    \end{axis}
  \end{tikzpicture}

  \begin{tikzpicture}
    \begin{axis}
      \addplot[only marks, mark=*, blue]
        table {\jobname-plot.dat};
      \addplot[draw=none]
        table[
          header=false,
          skip first n=5,
          y={create col/linear regression},
        ] {\jobname-plot.dat};  
      \addplot[domain=0:10, red]
        {\pgfplotstableregressiona*x + \pgfplotstableregressionb};
    \end{axis}
  \end{tikzpicture}
\end{document}

Result

Related Question