[Tex/LaTex] Pgfplots linear regression (mean square error)

pgfplotspgfplotstabletikz-pgf

Pgfplots offers the possibility to compute a linear regression, as in the following post Linear regression – trend line with pgfplots .

However, is it possible to compute the mean square error ?

My first idea would be to create a new column with the square error and then compute the mean. But is it possible to do it inside pgfplots ?

Best Answer

You can calculate the regression line in a new column of your table, and then calculate the (square) error from that:

\documentclass{article}
\usepackage{tikz}
\usepackage{pgfplots}
\usepackage{pgfplotstable}

\usetikzlibrary{calc}

\begin{document}

\pgfmathsetseed{1138} % set the random seed
\pgfplotstableset{ % Define the equations for x and y
    create on use/x/.style={create col/expr={42+5*\pgfplotstablerow}},
    create on use/y/.style={create col/expr={(0.6*\thisrow{x}+130)+5*rand}},
}
% create a new table with 30 rows and columns x and y:
\pgfplotstablenew[columns={x,y}]{10}\loadedtable

% Calculate the regression line
\pgfplotstablecreatecol[linear regression]{regression}{\loadedtable}

% Calculate the errors
\pgfplotstablecreatecol[
    create col/expr={\thisrow{y}-\thisrow{regression}}
    ]{error}{\loadedtable}

% Calculate the average squared error
\pgfmathsetmacro\totalsquarederror{0}
\pgfplotstableforeachcolumnelement{error}\of\loadedtable\as\error{
    \pgfmathsetmacro\totalsquarederror{\totalsquarederror+(\error)^2}
}
\pgfplotstablegetrowsof\loadedtable
\pgfmathsetmacro\meansquarederror{\totalsquarederror/\pgfplotsretval}
Mean squared error: \meansquarederror

\pgfplotstabletypeset[fixed]{\loadedtable}

\begin{tikzpicture}
\begin{axis}[
xlabel=Weight (kg), % label x axis
ylabel=Height (cm), % label y axis
axis lines=left, %set the position of the axes
xmin=40, xmax=105, % set the min and max values of the x-axis
ymin=150, ymax=200, % set the min and max values of the y-axis
clip=false
]

\addplot [only marks] table {\loadedtable};
\addplot [no markers, thick, red,
    error bars/.cd,
        y dir=plus,
        y explicit,
        draw error bar/.code 2 args={
        \edef\expression{($##1!1.414!45:##2$)}
        \fill ##1 rectangle \expression;
        }
    ] table [y=regression, y error expr=\thisrow{error}] {\loadedtable} node [anchor=west] {$\pgfmathprintnumber[precision=2, fixed zerofill]{\pgfplotstableregressiona} \cdot \mathrm{Weight} + \pgfmathprintnumber[precision=1]{\pgfplotstableregressionb}$};
\end{axis}

\end{tikzpicture}
\end{document}