[Tex/LaTex] Calculating and showing the sum of squares in a diagram

tikz-pgf

Using some plain old Metapost code I am able to create a basic plot of linear regression. The question is: Can I create a graphics with e.g. TikZ which allows me to specify a few points coordinates (black) )and two points (or the coefficients of the linear model) for the regression line and then automatically

  • draws the red lines
  • plots the green points
  • shows a calculated sum of squares in the diagram

It could come handy to do it in LaTeX directly to show how the parameters of the regression line influence the sum of squared differences.

enter image description here

Best Answer

Here's a solution based on the datatool package:

\documentclass{article}
\usepackage{datatool}
\usepackage{dataplot}

\begin{document}

% define data set (could also be read from csv file)
\DTLnewdb{mydata}
\DTLnewrow{mydata}%
\DTLnewdbentry{mydata}{x}{1}%
\DTLnewdbentry{mydata}{y}{2.3}%
\DTLnewrow{mydata}%
\DTLnewdbentry{mydata}{x}{2}%
\DTLnewdbentry{mydata}{y}{3.4}%
\DTLnewrow{mydata}%
\DTLnewdbentry{mydata}{x}{3}%
\DTLnewdbentry{mydata}{y}{4.1}%


% calculate extra columns 
\DTLforeach{mydata}{%
  \valx=x,%
  \valy=y}{%
  \DTLmul{\result}{\valx}{\valx}%
  \DTLappendtorow{xx}{\result}%
  \DTLmul{\result}{\valx}{\valy}%
  \DTLappendtorow{xy}{\result}%
}

% calculate required averages                 
\DTLmeanforcolumn{mydata}{x}{\mx}
\DTLmeanforcolumn{mydata}{y}{\my}
\DTLmeanforcolumn{mydata}{xx}{\mxx}
\DTLmeanforcolumn{mydata}{xy}{\mxy}
\DTLvarianceforcolumn{mydata}{x}{\vx}

% calculate slope
\DTLmul{\tmpa}{\mx}{\my}
\DTLsub{\tmpb}{\mxy}{\tmpa}
\DTLdiv{\fita}{\tmpb}{\vx}
\DTLround{\fitar}{\fita}{3}

% calculate intercept
\DTLmul{\tmpa}{\mxx}{\my}
\DTLmul{\tmpb}{\mxy}{\mx}
\DTLsub{\tmpc}{\tmpa}{\tmpb}
\DTLdiv{\fitb}{\tmpc}{\vx}
\DTLround{\fitbr}{\fitb}{3}


% prepare data for line
\DTLminforcolumn{mydata}{x}{\minx}
\DTLmaxforcolumn{mydata}{x}{\maxx}
\DTLmul{\tmpa}{\minx}{\fita}
\DTLadd{\tmpb}{\tmpa}{\fitb}
\DTLmul{\tmpa}{\maxx}{\fita}
\DTLadd{\tmpc}{\tmpa}{\fitb}


\renewcommand*{\DTLplotatendtikz}{%
  \draw (\minx,\tmpb) -- (\maxx,\tmpc);
}

\begin{figure}[htbp]
\centering
\DTLplot{mydata}{x=x,y=y,width=3in,height=3in}
\caption{The fit function is $\fitar\ x + \fitbr$}
\end{figure}

\end{document}
Related Question