[Tex/LaTex] Can’t get unicode symbols in math mode

cyrillicunicode-math

I am trying for about a week to launch unicode-math package with XeLaTeX and got nothing. PDF is created well but all Cyrillic symbols in math mode are skipped. I tried using different math fonts – no progress at all. In log file, I found low-level error:

Missing character: There is no [cyrillic letter from input] in font cmmi12!

But all fonts used in document are Unicode ones.

Here is a file I want to be processed well (of course it is UTF-8).

\documentclass[12pt]{book}
\usepackage{polyglossia}
\setdefaultlanguage[spelling=modern]{russian}
\setotherlanguage{english}
\defaultfontfeatures{Ligatures={TeX}}
\setmainfont{CMU Serif}
\setsansfont{CMU Sans Serif}
\setmonofont{CMU Typewriter Text}  

\usepackage{amsmath, amssymb}
\usepackage[russian]{hyperref}

\usepackage{unicode-math}
\setmathfont{Latin Modern Math}

\frenchspacing

\begin{document}
Просто буквы % Plain letters
$$Память: M_{доп}(n) = \Theta(N)$$ % Memory: M_add(n) = \Theta(n)
\end{document}

Looking for your help.

Best Answer

This is not a problem of cyrillic math characters; if the text were English, the correct input would be

Letters only
\[
\text{Memory: } M_{\textup{add}}(n) = \Theta(N)
\]

because Память and доп are not math. The difference becomes clear when comparing this with the output of

\[
Memory: M_{add}(n) = \Theta(N)
\]

enter image description here

The bottom formula is clearly wrong. Textual subscripts are not math variables, so they should be typeset in the normal text font (upright), thus either \textnormal or \textup (the latter is shorter). Of course, you can define your own command for them.

Here's the complete example:

\documentclass[12pt]{book}

\usepackage{amsmath,amssymb}
\usepackage{unicode-math}
\usepackage{polyglossia}
\setdefaultlanguage[spelling=modern]{russian}
\setotherlanguage{english}

\setmainfont{CMU Serif}
\setsansfont{CMU Sans Serif}
\setmonofont{CMU Typewriter Text}  

\usepackage{color}
\usepackage{minted}
\usepackage[russian]{hyperref}

\setmathfont{Latin Modern Math}

\frenchspacing

\begin{document}

Просто буквы % Plain letters
\[
\text{Память: } M_{\textnormal{доп}}(n) = \Theta(N)
\]
\end{document}

enter image description here

It would be a different problem if you wanted to use a cyrillic letter as a math variable, but your case is not this one.

If you need cyrillic letters as math variables, here's a way to set them up:

\documentclass[12pt]{book}

\usepackage{amsmath,amssymb}
\usepackage{unicode-math}
\usepackage{polyglossia}
\setdefaultlanguage[spelling=modern]{russian}
\setotherlanguage{english}

\setmainfont{CMU Serif}
\setsansfont{CMU Sans Serif}
\setmonofont{CMU Typewriter Text}  

\usepackage[russian]{hyperref}

\setmathfont{Latin Modern Math}

\DeclareSymbolFont{cyrletters}{\encodingdefault}{\familydefault}{m}{it}
\newcommand{\makecyrmathletter}[1]{%
  \begingroup\lccode`a=#1\lowercase{\endgroup
  \Umathcode`a}="0 \csname symcyrletters\endcsname\space #1
}
\count255="409
\loop\ifnum\count255<"44F
  \advance\count255 by 1
  \makecyrmathletter{\count255}
\repeat

\begin{document}
\[
(д+ф)^{2}=д^{2}+2дф+ф^{2}
\]
\end{document}

enter image description here

What does \makecyrmathletter do? Let's review it. The idea is that it takes as argument an integer and performs some magic. We use it in the following loop where the first cycle is

\makecyrmathletter{\count255}

with \count255 having the value "410 (hexadecimal), which corresponds to А U+0410 CYRILLIC CAPITAL LETTER A.

In order to understand the code, I'll assume the explicit value is passed. The first level expansion is then

\begingroup\lccode`a="410\lowercase{\endgroup
\Umathcode`a}="0 \csname symcyrletters\endcsname\space "410

The strange \begingroup construction is used to obtain the letter from the number: we can loop through numbers, not letters. So inside the group, the \lccode of the letter a (the backtick notation is called “alphabetic constant”) to "410. With this setting, \lowercase will scan through its argument, changing every character token into its “lowercase” counterpart, but it actually uses the \lccode table. Then the result will be delivered to be scanned again. Hence we obtain

\endgroup\Umathcode`А="0 \csname symcyrletters\endcsname\space "410

(only the a is changed, control sequences pass through \lowercase with no change). The \endgroup does its job, namely to revert the change of \lccode`a to what it was before, and vanishes.

Then the \Umathcode assignment is performed. It assigns А a math code, that is a new interpretation when found in math mode. The = should be followed by three numbers. The first one states the type of the object; 0 means an ordinary symbol; the second one tells XeTeX from what font family to take it. \csname symcyrletters\endcsname produces the number that has been assigned with the previous \DeclareSymbolFont declaration. Using a symbolic name we don't need to know what number is actually assigned. The third number tells XeTeX what slot the character should be taken from and we obviously choose "410, so a Cyrillic А. The three numbers should be separated by a space, which is explicit in the first case; we need \space in the second case, because leaving a blank space would not work. Since expansion is performed when looking for numbers, this \space is transformed in an actual space token.

A simpler loop can be used with expl3:

\ExplSyntaxOn

\int_step_inline:nnn { "410 } { "44F }
 {
  \Umathcode #1 = "0 ~ \use:c{ symcyrletters } ~ #1
 }

\ExplSyntaxOff

Related Solutions

[Tex/LaTex] What Unicode Font is Best Suited for Math Symbols using Font-Locking under Emacs

As suggested by egreg, I will turn my comments into an answer to at least part of your question:

One would think that perhaps the Computer Modern fonts themselves might exist in some form (encoding + format) that would allow them to be used from within Emacs, and if so has anyone had success using them for this purpose?

You could use the CM-Unicode fonts, which are installable under Windows on OS level and therefore usable from all applications using the system fonts. I use these fonts in Inkscape, Word and PowerPoint.

Quoting from the CM-Unicode homepage:

Computer Modern Unicode fonts were converted from metafont sources using mftrace with autotrace backend and fontforge (former pfaedit). Their main purpose is to create free good quality fonts for use in X applications supporting many languages. Currently the fonts contain glyphs from Latin1 (Metafont ec, tc, vnr), Cyrillic (lh) and Greek (cbgreek when available) code sets and IPA extensions (from tipa).

You also ask about STIX fonts. These fonts are also available in otf format, so I would say it should be easy to install these fonts on system level.

[Tex/LaTex] Computer Modern fonts and Cyrillic letters

Polyglossia loads fontspec, but the default font is Latin Modern, which has no support for Cyrillic.

You can load the Computer Modern Unicode fonts instead:

\documentclass{article}
\usepackage{fontspec}

\setmainfont[Ligatures=TeX]{CMU Serif}

\usepackage{polyglossia}
\setdefaultlanguage{english}
\setotherlanguage{russian}

\newcommand{\RU}[1]{\foreignlanguage{russian}{#1}}

\begin{document}
\noindent
\textrm{Hello! \RU{Привет!}}\\
\textit{Hello! \RU{Привет!}}\\
\textbf{Hello! \RU{Привет!}}\\
\textbf{\textit{Hello! \RU{Привет!}}}\\
\textsl{Hello! \RU{Привет!}}\\
\textsc{Hello! \RU{Привет!}}
\end{document}

I'd still use language changing commands, even if the font directly supports Cyrillic, because hyphenation would be incorrect (or missing) otherwise.

In case you don't have installed the CMU fonts as system fonts, you need a different way to call them, assuming your TeX Live has them.

\documentclass{article}
\usepackage{fontspec}

\setmainfont[
  Ligatures=TeX,
  Extension=.otf,
  BoldFont=cmunbx,
  ItalicFont=cmunti,
  BoldItalicFont=cmunbi,
]{cmunrm}

\usepackage{polyglossia}
\setdefaultlanguage{english}
\setotherlanguage{russian}

\newcommand{\RU}[1]{\foreignlanguage{russian}{#1}}

\begin{document}
\noindent
\textrm{Hello! \RU{Привет!}}\\
\textit{Hello! \RU{Привет!}}\\
\textbf{Hello! \RU{Привет!}}\\
\textbf{\textit{Hello! \RU{Привет!}}}\\
\textsl{Hello! \RU{Привет!}}\\
\textsc{Hello! \RU{Привет!}}
\end{document}

enter image description here

There's no slanted CMU font, so the call to \textsl becomes the same as \textit. One could artificially slant the upright font.

Best Answer

Related Solutions

[Tex/LaTex] What Unicode Font is Best Suited for Math Symbols using Font-Locking under Emacs

[Tex/LaTex] Computer Modern fonts and Cyrillic letters

Related Question