[Tex/LaTex] Can’t get unicode symbols in math mode

cyrillicunicode-math

I am trying for about a week to launch unicode-math package with XeLaTeX and got nothing. PDF is created well but all Cyrillic symbols in math mode are skipped. I tried using different math fonts – no progress at all. In log file, I found low-level error:

Missing character: There is no [cyrillic letter from input] in font cmmi12!

But all fonts used in document are Unicode ones.

Here is a file I want to be processed well (of course it is UTF-8).

\documentclass[12pt]{book}
\usepackage{polyglossia}
\setdefaultlanguage[spelling=modern]{russian}
\setotherlanguage{english}
\defaultfontfeatures{Ligatures={TeX}}
\setmainfont{CMU Serif}
\setsansfont{CMU Sans Serif}
\setmonofont{CMU Typewriter Text}  

\usepackage{amsmath, amssymb}
\usepackage[russian]{hyperref}

\usepackage{unicode-math}
\setmathfont{Latin Modern Math}

\frenchspacing

\begin{document}
Просто буквы % Plain letters
$$Память: M_{доп}(n) = \Theta(N)$$ % Memory: M_add(n) = \Theta(n)
\end{document}

Looking for your help.

Best Answer

This is not a problem of cyrillic math characters; if the text were English, the correct input would be

Letters only
\[
\text{Memory: } M_{\textup{add}}(n) = \Theta(N)
\]

because Память and доп are not math. The difference becomes clear when comparing this with the output of

\[
Memory: M_{add}(n) = \Theta(N)
\]

enter image description here

The bottom formula is clearly wrong. Textual subscripts are not math variables, so they should be typeset in the normal text font (upright), thus either \textnormal or \textup (the latter is shorter). Of course, you can define your own command for them.

Here's the complete example:

\documentclass[12pt]{book}

\usepackage{amsmath,amssymb}
\usepackage{unicode-math}
\usepackage{polyglossia}
\setdefaultlanguage[spelling=modern]{russian}
\setotherlanguage{english}

\setmainfont{CMU Serif}
\setsansfont{CMU Sans Serif}
\setmonofont{CMU Typewriter Text}  

\usepackage{color}
\usepackage{minted}
\usepackage[russian]{hyperref}

\setmathfont{Latin Modern Math}

\frenchspacing

\begin{document}

Просто буквы % Plain letters
\[
\text{Память: } M_{\textnormal{доп}}(n) = \Theta(N)
\]
\end{document}

enter image description here

It would be a different problem if you wanted to use a cyrillic letter as a math variable, but your case is not this one.

If you need cyrillic letters as math variables, here's a way to set them up:

\documentclass[12pt]{book}

\usepackage{amsmath,amssymb}
\usepackage{unicode-math}
\usepackage{polyglossia}
\setdefaultlanguage[spelling=modern]{russian}
\setotherlanguage{english}

\setmainfont{CMU Serif}
\setsansfont{CMU Sans Serif}
\setmonofont{CMU Typewriter Text}  

\usepackage[russian]{hyperref}

\setmathfont{Latin Modern Math}

\DeclareSymbolFont{cyrletters}{\encodingdefault}{\familydefault}{m}{it}
\newcommand{\makecyrmathletter}[1]{%
  \begingroup\lccode`a=#1\lowercase{\endgroup
  \Umathcode`a}="0 \csname symcyrletters\endcsname\space #1
}
\count255="409
\loop\ifnum\count255<"44F
  \advance\count255 by 1
  \makecyrmathletter{\count255}
\repeat

\begin{document}
\[
(д+ф)^{2}=д^{2}+2дф+ф^{2}
\]
\end{document}

enter image description here

What does \makecyrmathletter do? Let's review it. The idea is that it takes as argument an integer and performs some magic. We use it in the following loop where the first cycle is

\makecyrmathletter{\count255}

with \count255 having the value "410 (hexadecimal), which corresponds to А U+0410 CYRILLIC CAPITAL LETTER A.

In order to understand the code, I'll assume the explicit value is passed. The first level expansion is then

\begingroup\lccode`a="410\lowercase{\endgroup
\Umathcode`a}="0 \csname symcyrletters\endcsname\space "410

The strange \begingroup construction is used to obtain the letter from the number: we can loop through numbers, not letters. So inside the group, the \lccode of the letter a (the backtick notation is called “alphabetic constant”) to "410. With this setting, \lowercase will scan through its argument, changing every character token into its “lowercase” counterpart, but it actually uses the \lccode table. Then the result will be delivered to be scanned again. Hence we obtain

\endgroup\Umathcode`А="0 \csname symcyrletters\endcsname\space "410

(only the a is changed, control sequences pass through \lowercase with no change). The \endgroup does its job, namely to revert the change of \lccode`a to what it was before, and vanishes.

Then the \Umathcode assignment is performed. It assigns А a math code, that is a new interpretation when found in math mode. The = should be followed by three numbers. The first one states the type of the object; 0 means an ordinary symbol; the second one tells XeTeX from what font family to take it. \csname symcyrletters\endcsname produces the number that has been assigned with the previous \DeclareSymbolFont declaration. Using a symbolic name we don't need to know what number is actually assigned. The third number tells XeTeX what slot the character should be taken from and we obviously choose "410, so a Cyrillic А. The three numbers should be separated by a space, which is explicit in the first case; we need \space in the second case, because leaving a blank space would not work. Since expansion is performed when looking for numbers, this \space is transformed in an actual space token.

A simpler loop can be used with expl3:

\ExplSyntaxOn

\int_step_inline:nnn { "410 } { "44F }
 {
  \Umathcode #1 = "0 ~ \use:c{ symcyrletters } ~ #1
 }

\ExplSyntaxOff