# [Tex/LaTex] Problem with XeTeX input and output encoding

cjkinput-encodingsxecjkxetex

As we all known, the TeX engine XeTeX takes UTF-8 as its default input and output encoding format. However, it provides two new primitive control sequences that could define input encoding:

• \XeTeXinputencoding defines the input encoding of the following text.
• \XeTeXdefaultencoding defines the input encoding of subsequent files to be read.

Thus, one could put one of these two commands to specify the encoding that will be read by XeTeX, which makes XeTeX can handle files that are encoded by other formats. But unfortunately, XeTeX provides no interface to specify the output encoding format, so that it would write files with UTF-8 in every situation, even if we put \XeTeXinputencoding or \XeTeXdefaultencoding at the beginning of the master .tex file.

A MWE that shows this problem:

\XeTeXdefaultencoding "GBK"
\documentclass{article}
\begin{document}
\tableofcontents
\section{测试一}

\clearpage
\section{测试二}

\clearpage
\section{测试三}

\clearpage
\section{测试四}

\clearpage
\end{document}

(Note that there is no selection of CJK fonts here.)

If you save this piece of code as GBK (M\$ calls it cp936), and compile it by XeLaTeX, no error will be reported. However, you would not be surprised that the .toc and .aux files are written as UTF-8.

The fact interpreted above will lead to another embarrassing tangible. These temp files were written by UTF-8, whereas the master file was saved as GBK. Hence, if we put \XeTeXinputencoding "GBK" at the begining of the master file, temp files will be read as UTF-8; if we use \XeTeXdefaultencoding "GBK", then temp fils will be read as GBK (but they themselves are UTF-8).

ERROR occurred so quietly but sadly.

Is there any clues or hints?