The equation for the regression line for logarithmically transformed data is
Y=exp(b+m*X)
where m
and b
are your slope and intercept, respectively. So to plot the line, you should use
\addplot {exp(\intercept+\slope*x)};
Instead of using an addplot
command to determine the slope and intercept, you can do the regression outside of your axis
environment using \pgfplotstablecreatecol[linear regression={ymode=log}]{<col name>}{<data table>}
. Note that in that case, you have to explicitly set ymode=log
. Within a semilogyaxis
, this is done automatically.
Here's a complete example:
\documentclass{article}
\usepackage{pgfplots, pgfplotstable}
\begin{document}
\pgfplotstableread{
1 2.3
2 3.4
3 9
4 17
5 30
6 70
7 120
8 250
9 650
}\datatable
\pgfplotstablecreatecol[linear regression={ymode=log}]{regression}{\datatable}
\xdef\slope{\pgfplotstableregressiona} % save the slope parameter
\xdef\intercept{\pgfplotstableregressionb} % save the intercept parameter
\begin{tikzpicture}
\begin{axis}[
ymode=log,
xmin=0,xmax=10
]
\addplot [only marks, red] table {\datatable}; % plot the data
\addplot [no markers, domain=0:10] {exp(\intercept+\slope*x)};
\end{axis}
\end{tikzpicture}
\end{document}
You can use gnuplot
to do the parameter estimation within PGFPlots.
To estimate both the slope and the intercept, you could use the following \addplot
command:
\addplot [red, raw gnuplot] gnuplot {
a = -1;
b = 0.1;
f(x) = a*x+b;
fit f(x) 'data.dat' using (log($1)):(log($2)) via a,b;
set samples 2;
plot [x=100:10000] exp(f(log(x)));
};
This defines the initial parameter values and the equation, and then fits the parameters to the log transformed values found in the data file data.dat
. For generating the plot, the number of samples is set to 2 (since we're plotting a straight line), the exponentiation function has to be applied to the function value, and the logarithm has to be taken of the x samples.
To prescribe the slope, change the via a,b
in the fit
line to via b
. That way, a
will be kept fixed at its initial value, and only the intercept will be estimated.
Here's an example looking at the convergence of the Monte Carlo approach to estimating Pi (Example 1). The red line uses a theoretical convergence rate of -1, the black line uses the rate estimated from the data.
\documentclass{article}
\usepackage{pgfplots}
\usepackage{filecontents}
\begin{filecontents*}{data.dat}
N e
100 0.0984
400 0.0316
1600 0.0284
6400 0.00659
10000 0.00359
\end{filecontents*}
\begin{document}
\begin{tikzpicture}
\begin{axis}[
xmode=log, ymode=log,
domain=100:10000
]
\addplot [only marks] table [y=e] {data.dat};
\addplot [red, raw gnuplot] gnuplot {
a = -1;
b = 0.1;
f(x) = a*x+b;
fit f(x) 'data.dat' using (log($1)):(log($2)) via b;
set samples 2;
plot [x=100:10000] exp(f(log(x)));
} node [pos=0.25, above right] {$a=-1$};
\addplot [raw gnuplot] gnuplot {
a = -1;
b = 0.1;
f(x) = a*x+b;
fit f(x) 'data.dat' using (log($1)):(log($2)) via a,b;
set samples 2;
plot [x=100:10000] exp(f(log(x)));
} node [pos=0.25, below left] {$a=-0.67$} ;
\end{axis}
\end{tikzpicture}
\end{document}
Best Answer
You can calculate the regression line in a new column of your table, and then calculate the (square) error from that: