[Tex/LaTex] Using CJK (Chinese/kanji) characters in math using unicode-math on lualatex

cjkfontspecluatexmath-modeunicode-math

I'm trying to persuade the unicode-math package to let me insert CJK characters (Chinese ideograms) in math formulæ. Here's a sample input (the first few lines are merely to check that basic functionalities and fonts are present:

 \documentclass[a4paper]{article}
 \usepackage{fontspec}
 \usepackage{unicode-math}
 \setmainfont{Linux Libertine O}
 \newfontfamily\cjkfont[Script=CJK]{Kochi Mincho}
 \setmathfont{XITS Math}
 \setmathfont[range={"4E00-"9FFF},Script=CJK]{Kochi Mincho}
 \begin{document}
 Hello, world!  Здравствуй, мир!  Unicode est vraiment \emph{épatant} !

 \cjkfont{漢字}

 $\mathbf{Δ} = (Δ_ι)_{ι∈I}$

 $無 = ∅$
 \end{document}

The "Linux Libertine O" and "Kochi Mincho" fonts are present on my system, respectively as LinLibertine_R.otf and kochi-mincho-subst.ttf. I'm using lualatex from TeXlive 2012, and the fontspec and unicode-math packages it contains. Everything works fine except the last formula, in which the character 無 simply does not appear.

Now the line

 \setmathfont[range={"4E00-"9FFF},Script=CJK]{Kochi Mincho}

was supposed to mean "for all Unicode characters in the range from U+4E00 to U+9FFF (i.e., the main CJK block), use the Kochi Mincho font".

The only clue I have is this warning:

*************************************************
* fontspec warning: "script-not-exist"
* 
* Font 'KochiMincho' does not contain script 'CJK'.
*************************************************

But first of all I can't imagine what it means (the font in question is a CJK font, so obviously it does contain the CJK script, and, indeed, in text mode it works fine); and besides, using the \cjkfont command I defined produces the same warning but still works, so this warning alone cannot be fatal.

Best Answer

The characters to be used in math mode are from CJK languages. In general these characters can be considered as ordinary symbols. According to the math classification -- see also my explanation below! -- there are two such classes: 0 and 7. Typesetting of CJK languages is different from typesetting languages with alphabets. E.g., traditionally CJK languages do not use italics for emphasis (but may have other means to do so). If italics, bold shape ... do not existing for such a font and \mathit, \mathbf, ... cannot be used then it seems appropriate to choose class 0 instead of class 7. Actually by default, a unicode character "zzzzzz ("0 - "10FFFF) is assigned Umathcode "0"0"zzzzzz. Hence, the character is already considered as an ordinary symbol of font family 0 and no change is necessary.

But it seems that \setmathfont (unicode-math, version 0.7c) is not working properly. As a workaround we define the command \adjustmathfont that uses a counter my@char to steps through the range from the first index #1 to the last index #2. At each step we adjust the font family by \Umathcode\value{my@char} = "0 #3 \value{my@char} to the font family given by the third argument #3. For example, if #1 and #2 are equal to "7121 and #3 is equal to "4 this just produces \Umathcode"7121="0"4"7121. The full code in a MWE follows.

\documentclass{article}
\usepackage{fontspec}
\usepackage{unicode-math}
\setmainfont{Linux Libertine O}
\newfontfamily\cjkfont{Kochi Mincho}

%------ workaround ------
\usepackage{etoolbox}
\makeatletter

%usage: \adjustmathfont{arg1}{arg2}{arg3}
%   where  arg1 is the beginning of the unicode range, e.g. "4E00
%          arg2 is the end of the unicode range, e.g. "9FFF
%          arg3 is the font number, e.g. "4
\newcounter{my@char}
\newcommand{\adjustmathfont}[3]{%
  \ifnumgreater{#1}{#2}{%
    \PackageWarning{}{No adjustment of math font since #1 is greater than #2.}
  }{
    \setcounter{my@char}{#1}
    \Umathcode\value{my@char}="0 #3 \value{my@char}
    \whileboolexpr{%
      test {\ifnumless{\value{my@char}}{#2}} 
    }{%
      \stepcounter{my@char}
      \Umathcode\value{my@char}="0 #3 \value{my@char}
    }
  }
}
\makeatother
%------------------------

\setmathfont{XITS Math}
\setmathfont[range={"4E00-"9FFF}]{Kochi Mincho}
%the new math font (here "Kochi Mincho") might use font number 4 or higher;
%please see @Gro-Tsen's comment how to automate this;
\adjustmathfont{"4E00}{"9FFF}{"4}

\begin{document}
Hello, world!  Здравствуй, мир!  Unicode est vraiment \emph{épatant}!  \cjkfont{漢字}

$\mathbf{Δ} = (Δ_ι)_{ι∈I}$  $無_無^無 = ∅$
\end{document}

BTW, the usage of \cjkfont could be avoided by using an approach as shown in this blog. For example, the package fontspec can be replaced by ctex and \setCJKmainfont{Kochi Mincho} needs to be added. Then \cjkfont is not needed.


Some details about math mode

Math mode has different rules from "normal" text typesetting. In math mode each character is assigned a "mathcode" (hexadecimal "xyzz), which tells how to print that character. The mathcode consists of three parts: the "math class" x, the font family y, the position zz of the character in that font family.

The class x controls several aspects of typesetting of a character, especially the spacing, and can take following eight values: 0: ordinary symbol, 1: large operator, 2: binary operator, 3: relation, 4: opening symbol, 5: closing symbol, 6: punctuation, 7: variable family (= oridnary symbol except that \fam is choosen instead of y if \fam in the range 0-15). The font family y is from the range 0-15. The position zz is from the range 0-255.

For example, the mathcode of the symbol \, is set by \mathcode`\,="613B which means that \, is considered as punctuation and typeset by using the symbol "3B of font family 1. More examples can be found in the file "tex/plain/base/plain.tex".

Nowadays computers are much less restricted than some decades ago. Thus, by using the package unicode-math the ranges of the mathcode are extended: for the font family to yy (8 bits) and for the charater positions to zzzzzz (ranging "0 to "10FFFF, about 21 bits) to suit Unicode fonts. The extended fields can be set by \Umathcode"zzzzzz="x"yy"zzzzzz, for example, \Umathcode\leftarrow="3"0"02190. (For details, see the luatexref documentation mentioned here.)