I'm trying to persuade the unicode-math package to let me insert CJK characters (Chinese ideograms) in math formulæ. Here's a sample input (the first few lines are merely to check that basic functionalities and fonts are present:
\documentclass[a4paper]{article}
\usepackage{fontspec}
\usepackage{unicode-math}
\setmainfont{Linux Libertine O}
\newfontfamily\cjkfont[Script=CJK]{Kochi Mincho}
\setmathfont{XITS Math}
\setmathfont[range={"4E00-"9FFF},Script=CJK]{Kochi Mincho}
\begin{document}
Hello, world! Здравствуй, мир! Unicode est vraiment \emph{épatant} !
\cjkfont{漢字}
$\mathbf{Δ} = (Δ_ι)_{ι∈I}$
$無 = ∅$
\end{document}
The "Linux Libertine O" and "Kochi Mincho" fonts are present on my system, respectively as LinLibertine_R.otf
and kochi-mincho-subst.ttf
. I'm using lualatex
from TeXlive 2012, and the fontspec
and unicode-math
packages it contains. Everything works fine except the last formula, in which the character 無 simply does not appear.
Now the line
\setmathfont[range={"4E00-"9FFF},Script=CJK]{Kochi Mincho}
was supposed to mean "for all Unicode characters in the range from U+4E00 to U+9FFF (i.e., the main CJK block), use the Kochi Mincho font".
The only clue I have is this warning:
*************************************************
* fontspec warning: "script-not-exist"
*
* Font 'KochiMincho' does not contain script 'CJK'.
*************************************************
But first of all I can't imagine what it means (the font in question is a CJK font, so obviously it does contain the CJK script, and, indeed, in text mode it works fine); and besides, using the \cjkfont
command I defined produces the same warning but still works, so this warning alone cannot be fatal.
Best Answer
The characters to be used in math mode are from CJK languages. In general these characters can be considered as ordinary symbols. According to the math classification -- see also my explanation below! -- there are two such classes: 0 and 7. Typesetting of CJK languages is different from typesetting languages with alphabets. E.g., traditionally CJK languages do not use italics for emphasis (but may have other means to do so). If italics, bold shape ... do not existing for such a font and
\mathit
,\mathbf
, ... cannot be used then it seems appropriate to choose class 0 instead of class 7. Actually by default, a unicode character"zzzzzz
("0
-"10FFFF
) is assigned Umathcode"0"0"zzzzzz
. Hence, the character is already considered as an ordinary symbol of font family 0 and no change is necessary.But it seems that
\setmathfont
(unicode-math, version 0.7c) is not working properly. As a workaround we define the command\adjustmathfont
that uses a countermy@char
to steps through the range from the first index#1
to the last index#2
. At each step we adjust the font family by\Umathcode\value{my@char} = "0 #3 \value{my@char}
to the font family given by the third argument#3
. For example, if#1
and#2
are equal to"7121
and#3
is equal to"4
this just produces\Umathcode"7121="0"4"7121
. The full code in a MWE follows.BTW, the usage of
\cjkfont
could be avoided by using an approach as shown in this blog. For example, the packagefontspec
can be replaced byctex
and\setCJKmainfont{Kochi Mincho}
needs to be added. Then\cjkfont
is not needed.Some details about math mode
Math mode has different rules from "normal" text typesetting. In math mode each character is assigned a "mathcode" (hexadecimal
"xyzz
), which tells how to print that character. The mathcode consists of three parts: the "math class"x
, the font familyy
, the positionzz
of the character in that font family.The class
x
controls several aspects of typesetting of a character, especially the spacing, and can take following eight values: 0: ordinary symbol, 1: large operator, 2: binary operator, 3: relation, 4: opening symbol, 5: closing symbol, 6: punctuation, 7: variable family (= oridnary symbol except that \fam is choosen instead ofy
if \fam in the range 0-15). The font familyy
is from the range 0-15. The positionzz
is from the range 0-255.For example, the mathcode of the symbol
\,
is set by\mathcode`\,="613B
which means that\,
is considered as punctuation and typeset by using the symbol "3B of font family 1. More examples can be found in the file "tex/plain/base/plain.tex".Nowadays computers are much less restricted than some decades ago. Thus, by using the package
unicode-math
the ranges of the mathcode are extended: for the font family toyy
(8 bits) and for the charater positions tozzzzzz
(ranging"0
to"10FFFF
, about 21 bits) to suit Unicode fonts. The extended fields can be set by\Umathcode"zzzzzz="x"yy"zzzzzz
, for example,\Umathcode\leftarrow="3"0"02190
. (For details, see the luatexref documentation mentioned here.)