[Tex/LaTex] Why characters UTF-8 encoded are ISO-8859-1 encoded when written in an external file

auxiliary-filesinput-encodingspdftexunicode

For an UTF-8 encoded file as follows:

\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
%
\newwrite\outtmp
\immediate\openout\outtmp=out.tmp
%
\begin{document}
\newcommand{\foo}{Résumé}
\foo
\immediate\write\outtmp{\foo}%
\end{document}

the out.tmp output file is ISO-8859-1 encoded. Why:

  • not UTF-8?
  • ISO-8859-1 and not something else?

Best Answer

The main point here is that the input file is not LaTeX but a mixture of LaTeX and plain TeX or rather TeX primitives that are not supported in LaTeX files (ok they appear inside packages and inside the kernel, but using them means one has to understand the (sometimes not properly documented limitations and the conventions to get around them).

So TeX is writing as if it is typesetting (given a certain font encoding (not file encoding)), @egreg described that perfectly in his answer.

LaTeX goes a long way to define an internal encoding that can be translated to various font encodings or other encodings and if you would want to get utf-8 output you would need to switch first to to a "font" encoding that outputs utf-8, ie that translates stuff like \'e (which is an LICR = LaTeX Internal Character Encoding) to the two utf-8 bytes. Now that doesn't exist but technically it would be possible to set up properly.

The use of \protected@write is the LaTeX way to output stuff transparently on 7bit so that it can be read back in regardeless of the input encoding in force at that time

Related Question