[Tex/LaTex] Pandas to_latex Encoding Issues

input-encodingspython

I'm using python and pandas to run a script and generate a latex table from a couple files I have.

I'm doing this by importing a java .properties file (encoded in ISO-8859-1 with special unicode-escaped characters) which I'm calling dictionaryFileName that contains paired values for translations and mapping them to entries in a csv file (permissionFileName) that contains the permissions I want to translate. For that, i'm using pandas to import the csv and manage it. Lastly, i'm exporting them straight to latex table format with pandas.series.to_latex().

Here's the python script I'm using:

with open(dictionaryFileName, encoding="ISO-8859-1") as d:
    commands = dict(line.split('=', 1) for line in d)

commands = {key:value.encode().decode('unicode-escape') 
            for key, value in commands.items()}

data = pd.read_csv(permissionFileName)

data.module_name = (data.module_name
                       .map(commands)
                       .replace('\\n', '', regex=True))
grouped_data = data.groupby(['module_name', 'group_name','permission_name']).count()

with open("../apendices/permission-table.tex", "w+") as pt:
    pt.write(grouped_data.to_latex(multirow=True, longtable=True, escape=False, encoding="UTF-8"))

I'm exporting it as UTF-8, since that's the encoding I'm using in latex. Then, I import the table in latex with \input(apendices/permission-table.tex) using the code below:

main.tex

\usepackage[utf8]{inputenc}
\usepackage[english, brazil]{babel}
\begin{document}
\input{apendices/permission-table.tex}
\end{document}

permission-table.tex:

\begin{longtable}{lll}
\multirow{14}{*}{Avaliação} & Página de autoavaliação & Ler \\
\end{longtable}

The input files look like this:

dictionaryFile

EVALUATION=Avalia\u00e7\u00e3o

permissionFile

ACTION_PLAN,GROUP_ANALYTICAL_ACTION_PLAN_REPORT,READ

However, when I try to compile the latex file, every special character (such as the one from the example above with 'ç' and 'ã') throws the error message "Package inputenc Error: Invalid UTF-8 byte sequence.".

Copying the contents of the file, pasting it in another file and importing this new one instead solves the problem though. If I open the file in any text reader, all characters are as they're supposed to be.

If I use \useRawInputEncoding the special characters just go missing.

I wanted this to work straight out of python because I'll need to update this table a lot and it'd be easier if I didn't have to manually copy a file into another to make it work.

Best Answer

I ended up finding out thanks to the answer from @Marijn.

I actually needed to set the encoding to the output file from python, not to the dataframe from pandas.

So, changing the open file as write to the following code mate it work:

with open("../apendices/permission-table.tex", "w+", encoding="UTF-8") as pt:
    pt.write(grouped_data.to_latex(multirow=True, longtable=True, escape=False, encoding="UTF-8"))