The error message "Fatal format file error; I'm stymied" means that TeX binary is trying to load latex.fmt
(or pdflatex.fmt
) but the version of this TeX binary differs from another TeX binary which created such latex.fmt
. There could be two reasons: you have installed two TeX binaries (in different versions) or you have somewhere in your computer the old latex.fmt
from previous TeX installation.
The pdflatex.exe
must be implemented as one what runs TeX binary (most probably pdftex.exe
) and sets the message to this binary: "hey, load pdflatex.fmt
". And this loading is broken as described above. I don't know how exactly it is implemented in MikTeX, sorry. I never used MS Windows.
New TeX distributions has more TeX binaries: "tex", "pdftex", "luatex" and "xetex". If one of them creates file.fmt
and another of them reads such file.fmt
then the error message mentioned above occurs too. This is reason why TeX distributions save the generated file.fmt
to directories specific for used TeX binary and they have implemented a searching system over such directory trees.
It is indeed true that XeTeX can produce invalid UTF-8 in its error output, and I can reproduce this with the following simpler .tex
file:
\documentclass{article}
\begin{document}
应该把 123456789 123456789 123 \textwidth换成
\end{document}
So you can consider this either a bug in XeTeX (for producing invalid UTF-8) or in Pandoc (for incorrectly assuming that XeTeX will produce valid UTF-8).
Unicode and UTF-8
The problem, in short, is that you cannot just break a sequence of UTF-8 bytes in any arbitrary place. To take an example, in the string 应该把
, the characters are:
So the string as a whole is encoded in UTF-8 as a sequence of 9 bytes:
E5 BA 94 E8 AF A5 E6 8A 8A
\______/ \______/ \______/
应 该 把
You can break the byte sequence after 0, 3, 6, or 9 bytes to get a valid string containing 0, 1, 2 or 3 characters respectively. But breaking it at some other place results in invalid UTF-8.
Unfortunately, that is exactly what XeTeX can do: it can break the byte sequence in some such place, resulting in invalid UTF-8 that Pandoc then fails to cope with (because it assumes valid UTF-8).
Explanation
In the first place, in Unicode-aware engines like XeTeX and LuaTeX, all unicode characters can be part of control sequences, and there happens to be no control sequence named \textwidth换成
so the system generates an error about an undefined control sequence.
Then when printing out this error to the terminal, TeX tries to add additional context around where this undefined control sequence \textwidth换成
was encountered, which means some additional characters surrounding the occurrence, to fill error_line
characters. (This can be increased; see here and here. Though increasing this is a good idea anyway and decreases the likelihood of this error happening; it can still happen with sufficiently long lines (and does happen with the example in the question), because the max value of error_line
is still only 254.)
Unfortunately (and this is the bug), it appears that XeTeX counts by bytes and truncates the output without regard for breaking only at well-defined Unicode code-point sequences. Look for procedure show_context
in the XeTeX source code, and compare with print_valid_utf8
in the LuaTeX source code, used in its show_context
.
In this example, XeTeX picks up only the last two bytes of the first word (the 8A 8A
), which is not a valid UTF-8 sequence. That is why iconv and Pandoc complain.
Demonstration
The commands I used for compiling the above .tex
file with LuaTeX and XeTeX are respectively:
lualatex -interaction=nonstopmode test.tex | iconv -f UTF8
and
xelatex -interaction=nonstopmode test.tex | iconv -f UTF8
With the former (LuaTeX), I get the error message:
! Undefined control sequence.
l.3 ...把 123456789 123456789 123 \textwidth换成
but with the latter (XeTeX), I get an error message that is not valid UTF-8, so iconv
fails with
iconv: (stdin):11:7: cannot convert
Without iconv
, on my terminal I see printed:
! Undefined control sequence.
l.3 ...?? 123456789 123456789 123 \textwidth换成
and by redirecting the output to a file and viewing it in a raw editor, we can see better what's going on. The following is hexdump output from xxd -g 1 -c 32
:
000001c0: 78 29 0a 21 20 55 6e 64 65 66 69 6e 65 64 20 63 6f 6e 74 72 6f 6c 20 73 65 71 75 65 6e 63 65 2e x).! Undefined control sequence.
000001e0: 0a 6c 2e 33 20 2e 2e 2e 8a 8a 20 31 32 33 34 35 36 37 38 39 20 31 32 33 34 35 36 37 38 39 20 31 .l.3 ..... 123456789 123456789 1
00000200: 32 33 20 5c 74 65 78 74 77 69 64 74 68 e6 8d a2 e6 88 90 0a 20 20 20 20 20 20 20 20 20 20 20 20 23 \textwidth.......
Note the 8a 8a
(the last two bytes of 把
= E6 8A 8A
) just after the ellipsis (2e 2e 2e
meaning ...
).
Best Answer
Set the typesetting engine to »XeLaTeX« in the corresponding pull down menu of TeXworks (see picture). Alternatively you can add
% !TEX program = xelatex
as very first line to the source code and TeXworks will choose XeLaTeX automatically every time.