[Tex/LaTex] Scatterplot with data from CSV file and trendline

csvpgfplotsplottikz-pgf

I have created a scatterplot with code from this previous asked question.

\documentclass[varwidth=true, border=2pt]{standalone}
\usepackage[utf8]{inputenc} % this is needed for umlauts
\usepackage[ngerman]{babel} % this is needed for umlauts
\usepackage[T1]{fontenc}    % this is needed for correct output of umlauts in pdf
\usepackage[margin=2.5cm]{geometry} %layout  
\usepackage{pgfplots}


\begin{filecontents}{table3.csv}
column 1    column 2
1966    37.51817228
1960    40.56693583
1961    40.71972964
1962    40.97560208
1964    41.11687187
1963    41.25082828
1965    46.02625404
1960    46.22815872
1967    46.67800113
1961    48.39523271
\end{filecontents}


\begin{document}

\begin{tikzpicture}
    \begin{axis}[
            axis x line=middle,
            axis y line=middle,
            enlarge y limits=true,
            %xmin=0, xmax=2150,
            %ymin=0, ymax=600,
            width=15cm, height=8cm,     % size of the image
            grid = major,
            grid style={dashed, gray!30},
            ylabel=steps,
            xlabel=$n$,
            legend style={at={(0.1,-0.1)}, anchor=north}
         ]        
          \addplot[scatter,only marks] table [x=column 1, y=column 2, col sep=comma] {table3.csv};

          %the code below is added via @Peter's comment.
          \addplot[only marks] table [col sep = comma,y={create col/linear regression={y=column 2}}]{table3.csv};


    \end{axis}
\end{tikzpicture}

\end{document}

The scatter plot ends up great but I want to be able to add a trendline. All the examples of trendlines I've seen have been calculated with data inputted directly into the .TeX file and not from a .csv file.

Is it possible to do this?

My other thought was to go on on excel, calculate the trendline, and then overlap the line onto the graph. I'd really rather be able to do it in a more direct fashion though as my document has many graphs.

edit: Jake gave me great guidance on how to do this with data directly inputted to the TeX file but I am having trouble with parsing directly from the .csv file. I have added to my code and posted the error message I get in the console.

With the added code I get the error message.

./linearreg.tex:30: Package PGF Math Error: Could not parse input '' as a float
ing point number, sorry. The unreadable part was near ''..

Line 30 in my document is the added line with the linear regression equation.

final edit: I figured it out. This error was caused because my file had columns with blank data. I had to delete this blank data to calculate the linear regression line.

Here is the final mwe of my code:

\documentclass{article}
\usepackage{tikz}
\usepackage{pgfplots}
\usepackage{pgfplotstable}

\begin{document}
\pgfplotstableread[col sep = comma]{table4.csv}\loadedtable

\begin{tikzpicture}
    \begin{axis}[
        xlabel=Weight (kg), % label x axis
        ylabel=Height (cm), % label y axis
        axis lines=left, %set the position of the axes
        clip=false
    ]

            \addplot[scatter, only marks] table [x=column 1, y=column 2, col sep=comma] {\loadedtable};
            \addplot[very thick, red] table [col sep = comma,y={create col/linear regression={y=column 2}}]{\loadedtable};

    \end{axis}

\end{tikzpicture}
\end{document}

Best Answer

To get a linear regression line for data from a data file, use

\addplot [no markers] table [y={create col/linear regression={y=<column name>}}] {<file name>};

\documentclass[border=5mm]{standalone}
\usepackage{pgfplots, pgfplotstable}
\usepackage{filecontents}

\begin{filecontents}{table.dat}
x y
0 1
100 3
150 2
200 5
300 6
\end{filecontents}

\begin{document}

\begin{tikzpicture}
    \begin{axis}[
            axis x line=middle,
            axis y line=middle,
            enlarge y limits=true,
            width=15cm, height=8cm,     % size of the image
            grid = major,
            grid style={dashed, gray!30},
            ylabel=steps,
            xlabel=$n$,
            legend style={at={(0.1,-0.1)}, anchor=north}
         ]        
        \addplot[only marks] table  {table.dat};
        \addplot [no markers, thick, red] table [y={create col/linear regression={y=y}}] {table.dat};
    \end{axis}
\end{tikzpicture}

\end{document}