[Tex/LaTex] Problem with XeTeX input and output encoding

cjkinput-encodingsxecjkxetex

As we all known, the TeX engine XeTeX takes UTF-8 as its default input and output encoding format. However, it provides two new primitive control sequences that could define input encoding:

  • \XeTeXinputencoding defines the input encoding of the following text.
  • \XeTeXdefaultencoding defines the input encoding of subsequent files to be read.

Thus, one could put one of these two commands to specify the encoding that will be read by XeTeX, which makes XeTeX can handle files that are encoded by other formats. But unfortunately, XeTeX provides no interface to specify the output encoding format, so that it would write files with UTF-8 in every situation, even if we put \XeTeXinputencoding or \XeTeXdefaultencoding at the beginning of the master .tex file.

A MWE that shows this problem:

\XeTeXdefaultencoding "GBK"
\documentclass{article}
\begin{document}
\tableofcontents
\section{测试一}
这里是中文测试。
\clearpage
\section{测试二}
这里是中文测试。
\clearpage
\section{测试三}
这里是中文测试。
\clearpage
\section{测试四}
这里是中文测试。
\clearpage
\end{document}

(Note that there is no selection of CJK fonts here.)

If you save this piece of code as GBK (M$ calls it cp936), and compile it by XeLaTeX, no error will be reported. However, you would not be surprised that the .toc and .aux files are written as UTF-8.

The fact interpreted above will lead to another embarrassing tangible. These temp files were written by UTF-8, whereas the master file was saved as GBK. Hence, if we put \XeTeXinputencoding "GBK" at the begining of the master file, temp files will be read as UTF-8; if we use \XeTeXdefaultencoding "GBK", then temp fils will be read as GBK (but they themselves are UTF-8).

ERROR occurred so quietly but sadly.

Is there any clues or hints?

Best Answer

If you can't convert to utf8 (which is naturally the best as it will make life much easier) imho the best is to add \XeTeXinputencoding "GBK" at the begin of every file (master + input files) which uses this encoding.

\XeTeXdefaultencoding is simply misnamed: it can't be used to declare the default encoding of a complete project. (But you can use it to declare the encoding of a bundle of files - if you are sure that auxiliary files don't interfere).