Tables – Why Does Pandas DataFrame Table Not Render in TeX File?

pylatextables

I'm trying to generate a latex file with a table generated from a Pandas dataframe without success.

I'm using pylatex library to generate the entire file

The example code is as follows:

geometry_options = {"tmargin": "1cm", "lmargin": "10cm"}
doc = Document(geometry_options=geometry_options)

with doc.create(Section("Top 10 canales")) as target:
    target.append(df.to_latex())

doc.generate_pdf('full', clean_tex=False)

The output of print(df.to_latex()) is:

\begin{tabular}{lr}
\toprule
{} &    ATCCO \\
\midrule
All channels        &  16395.0 \\
OLATV\_El Trece      &   4421.0 \\
OLATV\_TN            &   2801.0 \\
OLATV\_Telefe        &   2113.0 \\
OLATV\_ESPN 2        &   1632.0 \\
OLATV\_Atcco Canal 2 &   1361.0 \\
OLATV\_America       &    925.0 \\
TV Publica          &    838.0 \\
OLATV\_Fox Sports 2  &    536.0 \\
OLATV\_ESPN          &    497.0 \\
\bottomrule
\end{tabular}

If i change the Section() statement for the Tabular() statement it returns an error at the time of compiling into pdf

The final tex file is as follows:

\documentclass{article}%
\usepackage[T1]{fontenc}%
\usepackage[utf8]{inputenc}%
\usepackage{lmodern}%
\usepackage{textcomp}%
\usepackage{lastpage}%
\usepackage{geometry}%
\geometry{tmargin=1cm,lmargin=10cm}%
%
%
%
\begin{document}%
\normalsize%
\section{Top 10 canales}%
\label{sec:Top10canales}%
\textbackslash{}begin\{tabular\}\{lr\}\newline%
\textbackslash{}toprule\newline%
\{\} \&    ATCCO \textbackslash{}\textbackslash{}\newline%
\textbackslash{}midrule\newline%
All channels        \&  16395.0 \textbackslash{}\textbackslash{}\newline%
OLATV\textbackslash{}\_El Trece      \&   4421.0 \textbackslash{}\textbackslash{}\newline%
OLATV\textbackslash{}\_TN            \&   2801.0 \textbackslash{}\textbackslash{}\newline%
OLATV\textbackslash{}\_Telefe        \&   2113.0 \textbackslash{}\textbackslash{}\newline%
OLATV\textbackslash{}\_ESPN 2        \&   1632.0 \textbackslash{}\textbackslash{}\newline%
OLATV\textbackslash{}\_Atcco Canal 2 \&   1361.0 \textbackslash{}\textbackslash{}\newline%
OLATV\textbackslash{}\_America       \&    925.0 \textbackslash{}\textbackslash{}\newline%
TV Publica          \&    838.0 \textbackslash{}\textbackslash{}\newline%
OLATV\textbackslash{}\_Fox Sports 2  \&    536.0 \textbackslash{}\textbackslash{}\newline%
OLATV\textbackslash{}\_ESPN          \&    497.0 \textbackslash{}\textbackslash{}\newline%
\textbackslash{}bottomrule\newline%
\textbackslash{}end\{tabular\}\newline%

%
\end{document}

Without the render:

enter image description here

Best Answer

The output from df.to_latex() is a string with LaTeX code in it. This does not fit into the normal processing model of PyLaTeX, which is to generate all elements in a document using the relevant Python classes (like Document(), Section(), Tabular() etc). The input for Section.append() is expected to be regular text, and if there are any LaTeX symbols in this input they are escaped with \textbackslash, \& and others.

For this type of situation PyLaTeX has the class NoEscape(), which switches off the special character processing. This is explained further on the manual page https://jeltef.github.io/PyLaTeX/current/usage.html#plain-latex-strings. In this case you need NoEscape(df.to_latex()).

MWE:

from pylatex import Document, Package, Section, NoEscape
import pandas as pd

df = pd.DataFrame([["All channels", 16395],["OLATV_El Trece",4421.0]],columns=["channel", "ATTCCO"])

geometry_options = {"tmargin": "1cm", "lmargin": "10cm"}
doc = Document(geometry_options=geometry_options)
doc.packages.append(Package('booktabs'))

with doc.create(Section("Top 10 canales")) as target:
    target.append(NoEscape(df.to_latex()))

doc.generate_pdf('full', clean_tex=False)

Result:

enter image description here

Note that I added the booktabs package to the document because it is required by to_latex() from Pandas. Also there seems to be a bug in to_latex() that causes an empty column to be skipped in the output, therefore I provided a column name for the first column as well.


For the empty column name a workaround is to use to_latex(multicolumn=False), see https://github.com/pandas-dev/pandas/issues/20008. Code:

df = pd.DataFrame([["All channels", 16395],["OLATV_El Trece",4421.0]],columns=["", "ATTCCO"])

with doc.create(Section("Top 10 canales")) as target:
    target.append(NoEscape(df.to_latex(multicolumn=False)))

Result:

enter image description here

Related Question