[Tex/LaTex] Pgfplots linear regression (mean square error)

pgfplotspgfplotstabletikz-pgf

Pgfplots offers the possibility to compute a linear regression, as in the following post Linear regression – trend line with pgfplots .

However, is it possible to compute the mean square error ?

My first idea would be to create a new column with the square error and then compute the mean. But is it possible to do it inside pgfplots ?

Best Answer

You can calculate the regression line in a new column of your table, and then calculate the (square) error from that:

\documentclass{article}
\usepackage{tikz}
\usepackage{pgfplots}
\usepackage{pgfplotstable}

\usetikzlibrary{calc}

\begin{document}

\pgfmathsetseed{1138} % set the random seed
\pgfplotstableset{ % Define the equations for x and y
    create on use/x/.style={create col/expr={42+5*\pgfplotstablerow}},
    create on use/y/.style={create col/expr={(0.6*\thisrow{x}+130)+5*rand}},
}
% create a new table with 30 rows and columns x and y:
\pgfplotstablenew[columns={x,y}]{10}\loadedtable

% Calculate the regression line
\pgfplotstablecreatecol[linear regression]{regression}{\loadedtable}

% Calculate the errors
\pgfplotstablecreatecol[
    create col/expr={\thisrow{y}-\thisrow{regression}}
    ]{error}{\loadedtable}

% Calculate the average squared error
\pgfmathsetmacro\totalsquarederror{0}
\pgfplotstableforeachcolumnelement{error}\of\loadedtable\as\error{
    \pgfmathsetmacro\totalsquarederror{\totalsquarederror+(\error)^2}
}
\pgfplotstablegetrowsof\loadedtable
\pgfmathsetmacro\meansquarederror{\totalsquarederror/\pgfplotsretval}
Mean squared error: \meansquarederror

\pgfplotstabletypeset[fixed]{\loadedtable}

\begin{tikzpicture}
\begin{axis}[
xlabel=Weight (kg), % label x axis
ylabel=Height (cm), % label y axis
axis lines=left, %set the position of the axes
xmin=40, xmax=105, % set the min and max values of the x-axis
ymin=150, ymax=200, % set the min and max values of the y-axis
clip=false
]

\addplot [only marks] table {\loadedtable};
\addplot [no markers, thick, red,
    error bars/.cd,
        y dir=plus,
        y explicit,
        draw error bar/.code 2 args={
        \edef\expression{($##1!1.414!45:##2$)}
        \fill ##1 rectangle \expression;
        }
    ] table [y=regression, y error expr=\thisrow{error}] {\loadedtable} node [anchor=west] {$\pgfmathprintnumber[precision=2, fixed zerofill]{\pgfplotstableregressiona} \cdot \mathrm{Weight} + \pgfmathprintnumber[precision=1]{\pgfplotstableregressionb}$};
\end{axis}

\end{tikzpicture}
\end{document}

Related Solutions

[Tex/LaTex] How to expand linear regression fit to the full x-axis range while using semilogyaxis

The equation for the regression line for logarithmically transformed data is

Y=exp(b+m*X)

where m and b are your slope and intercept, respectively. So to plot the line, you should use

\addplot {exp(\intercept+\slope*x)};

Instead of using an addplot command to determine the slope and intercept, you can do the regression outside of your axis environment using \pgfplotstablecreatecol[linear regression={ymode=log}]{<col name>}{<data table>}. Note that in that case, you have to explicitly set ymode=log. Within a semilogyaxis, this is done automatically.

Here's a complete example:

\documentclass{article}
\usepackage{pgfplots, pgfplotstable}
\begin{document}

\pgfplotstableread{
1   2.3
2   3.4
3   9
4   17
5   30
6   70
7   120
8   250
9   650
}\datatable

\pgfplotstablecreatecol[linear regression={ymode=log}]{regression}{\datatable}
\xdef\slope{\pgfplotstableregressiona} % save the slope parameter
\xdef\intercept{\pgfplotstableregressionb} % save the intercept parameter

\begin{tikzpicture}
\begin{axis}[
    ymode=log,
    xmin=0,xmax=10
]
\addplot [only marks, red] table {\datatable}; % plot the data
\addplot [no markers, domain=0:10] {exp(\intercept+\slope*x)}; 
\end{axis}
\end{tikzpicture}
\end{document}

[Tex/LaTex] Linear regression in a loglog-plot with fixed slope

You can use gnuplot to do the parameter estimation within PGFPlots.

To estimate both the slope and the intercept, you could use the following \addplot command:

  \addplot [red, raw gnuplot] gnuplot {
   a = -1;
   b = 0.1;
   f(x) = a*x+b;
   fit f(x) 'data.dat' using (log($1)):(log($2)) via a,b;
   set samples 2;
   plot [x=100:10000] exp(f(log(x)));  
  };

This defines the initial parameter values and the equation, and then fits the parameters to the log transformed values found in the data file data.dat. For generating the plot, the number of samples is set to 2 (since we're plotting a straight line), the exponentiation function has to be applied to the function value, and the logarithm has to be taken of the x samples.

To prescribe the slope, change the via a,b in the fit line to via b. That way, a will be kept fixed at its initial value, and only the intercept will be estimated.

Here's an example looking at the convergence of the Monte Carlo approach to estimating Pi (Example 1). The red line uses a theoretical convergence rate of -1, the black line uses the rate estimated from the data.

\documentclass{article}

\usepackage{pgfplots}
\usepackage{filecontents}

\begin{filecontents*}{data.dat}
N   e
100 0.0984
400 0.0316
1600 0.0284
6400     0.00659
10000 0.00359
\end{filecontents*}

\begin{document}
\begin{tikzpicture}
  \begin{axis}[
    xmode=log, ymode=log,
    domain=100:10000
  ]
  \addplot [only marks] table [y=e] {data.dat};
  \addplot [red, raw gnuplot] gnuplot {
   a = -1;
   b = 0.1;
   f(x) = a*x+b;
   fit f(x) 'data.dat' using (log($1)):(log($2)) via b;
   set samples 2;
   plot [x=100:10000] exp(f(log(x)));  
  } node [pos=0.25, above right] {$a=-1$};

  \addplot [raw gnuplot] gnuplot {
   a = -1;
   b = 0.1;
   f(x) = a*x+b;
   fit f(x) 'data.dat' using (log($1)):(log($2)) via a,b;
   set samples 2;
   plot [x=100:10000] exp(f(log(x)));  
  } node [pos=0.25, below left] {$a=-0.67$} ;


  \end{axis}
 \end{tikzpicture}
\end{document}

Best Answer

Related Solutions

[Tex/LaTex] How to expand linear regression fit to the full x-axis range while using semilogyaxis

[Tex/LaTex] Linear regression in a loglog-plot with fixed slope

Related Question