[Tex/LaTex] Unicode math at TeX primitive level

plain-textex-coreunicode-math

I am looking for information how to implement Math OTF font in XeTeX or LuaTeX
at TeX primitive level (using XeTeX or LuaTeX primitives). There exist a LaTeX package unicode-math.
Unfortunatelly, this package is LaTeX oriented, documentation is at LaTeX user
level and implementation in LaTeX3 language which is far from TeX primitives.
So, the inspiration from this package is impossible for me.

Where can I find such information?

I try to be more specific what I am looking for. The principles of math
typesetting in TeX can be summarized by:

  • You must set (at least) math family 2 and 3 using \font, \textfont, \scriptfont and
    \scripscriptfont primitives. The metric of such fonts must include more
    \fontdimens than basic 7 (dim 8–22 for family 2 and dim 8–13 in family 3,
    they are used as described in appendix G in the TeXbook).

  • Each elementary math object can be a "character" decladed by \matcode or a
    "control sequence" declared by \mathchardef. The data points to type of the
    object (used for horizontal spacing) and to the family plus slot of the font.

  • The \delimiter code must point to family plus slot. There are pointers
    to create a queue of consecutive characters (big to bigger brackes). These pointers are in font metric of used font and the queue can ends by special
    pointers for components of a bracket of arbitrary size. Similar principle is used for radicals queue.

  • The \mathop type of objects can point to the family plus slot where two
    variant sizes of big operators are. They are connected by pointers inside font
    metric.

So, this is (roughly speaking) the basic information about background of math
typesetting at TeX primitive level. But these informations are useless for me
when I am working with (say) texgyrepagella-math.otf. Where are the pointers of brackets or radical queues? Where are extended font dimens?

Of course, I know how to load this font using extended syntax of \font primitive, I can set a
font feature using \font primitive and I know that there exists primitives
\Umathchar and \Umathchardef. But this is very few information. This is
insufficient to declare basic unicode math typesetting.

Is there any info like I mentioned above, but for unicode math?

Best Answer

In classical TeX a number of math mode fonts are used to supply the output glyphs based on the input, and as observed in the question the relevant \mathcode of the input token. In contrast, when using a Unicode math mode font only one font is used to supply all of the glyphs. As such, rather than the limited number of slots available in a TeX font there are large number of math mode-specific entries in a Unicode font.

Both Unicode engines (XeTeX and LuaTeX) provide the primitive \Umathcode for setting the extended math codes required for this to work. Details are available in both the XeTeX and LuaTeX manual: the syntax is

\Umathcode ⟨char slot⟩ [=] ⟨math type⟩ ⟨fam.⟩ ⟨glyph slot⟩

Notice that there is a requirement to supply a family here but that these will all be the same.

To set up the font dimensions required for math mode working, the engine or a suitable loader has to read the table supplied by the font. In XeTeX this happens as part of the (extended) \font primitive, for example

\font\lmmx = "[latinmodern-math.otf]/OT:mode=base;script=math;"

whilst in LuaTeX a Lua-based loader is required to extend the \font primitive (which out-of-the-box is identical to that in TeX90) (Realistically the font loader to use with LuaTeX is luaotfload, which is based on that written for ConTeXt but loadable with plain, LaTeX, _etc. There is work ongoing to use the HarfBuzz shaper with LuaTeX but this is not at present usable to my knowledge.)

As only one font is in use, conversion between input and output glyphs requires some differences from classical TeX. For example, input such as

$y = mx + c$

will not give italic letters unless they have the correct \Umathcode to point to the 'correct' codepoint. For example, we need

\Umathcode `\y =  "7 "1 "1D466

(I'm assuming that we will use font 1 for all glyphs: this is not required.)

Operators in Unicode math are scaled by the font shaper directly rather than needing extensible parts. As such, something like \int is defined for Unicode use by

\let\int=∫

with the correct math code then chosen

\Umathcode `∫= "1 "1 `∫

Both XeTeX and LuaTeX have the \Uradical primitive for radicals: LuaTeX also has \Uroot.

An important consequence of using only one font is that for example making symbols bold requires that all of the relevant math codes change. Thus setting up something \bf requires that we map over all code points affected and alter their \Umathcode.

Whilst only one font is required, it is necessary to define math families two and three to satisfy the engine that sufficient math parameters are available. (This may change, certainly in LuaTeX, as it seems to be a hold-over of code paths from TeX90.) At the same time, script fonts need to be loaded telling the loader what they are. This leads to a minimal font loading set up something like

\font\lmmx   = "[latinmodern-math.otf]/OT:mode=base;script=math;" %
\font\lmmvii = "[latinmodern-math.otf]/OT:mode=base;script=math;+ssty=0;" at 7pt %
\font\lmmv   = "[latinmodern-math.otf]/OT:mode=base;script=math;+ssty=1;" at 5pt %
\textfont1 = \lmmx
\textfont2 = \lmmx
\textfont3 = \lmmx
\scriptfont1 = \lmmvii
\scriptfont2 = \lmmvii
\scriptfont3 = \lmmvii
\scriptscriptfont1 = \lmmv
\scriptscriptfont2 = \lmmv
\scriptscriptfont3 = \lmmv

(Again, I am assuming XeTeX font syntax here.)

As noted in comments, there are a large number of additional font dimensions in Unicode math fonts. LuaTeX gives these names (all listed in the LuaTeX manual), whilst for XeTeX they have numbers and are accessed using \fontdimen.


The TeX90 primitives \delimiter, \mathaccent and \radical all have extended Unicode versions: \Udelimiter, \Umathaccent and \Uradical. Unlike the TeX90 versions, \Udelimiter and \Uradical do not need to point to multiple glyph slots: only one slot is needed and the font shaper is responsible for growing the glyph as required. The syntax of \Umathaccent is significantly extended compared to \mathaccent, certainly for LuaTeX. All three primitives are described in the LuaTeX manual and to a lesser extend in the XeTeX one.