The main problem is that XeTeX is not applying math italic correction:
- XITS has italic correction and needs it for proper spacing.
- In case of Latin Modern the font does bot provide italic correction at all.
- Asana math lacks italic correction as well, but the glyphs are spaced in a way that alleviates the need of italic correction.
- Euler, though have italic correction, its upright design makes lack of italic correction not very visible (though your example is wrong, you have to pass
math-style=upright
option for Euler alphabets to be used).
Why XeTeX stopped applying italic correction is unclear to me, that part of the code have not been touched for ages, but the code was flawed anyway so my guess is that it worked accidentally and some of the recent math cleanup broke.
However, there is a workaround: open the XITS font in a font editor (preferably FontForge) and set the width of space glyph to 0, this will cause the engine to apply italic correction again.
For the curious, the application of italic correction or not depends on whether the space factor of the font is zero, even in the OpenType math branch of the code, and though this is true for TFM math fonts, it is not essentially the case for OpenType font.
Also, OpenType math spec diverges from TeX algorithm on when italic correction should be applied, but it is very vague, and MS implementation seems to differ from what is actually documented, so it is not very well supported by XeTeX and LuaTeX yet.
Update: XeTeX's master branch handles this better now, until a more robust handling of italic correction is devised.
In classical TeX a number of math mode fonts are used to supply the output glyphs based on the input, and as observed in the question the relevant \mathcode
of the input token. In contrast, when using a Unicode math mode font only one font is used to supply all of the glyphs. As such, rather than the limited number of slots available in a TeX font there are large number of math mode-specific entries in a Unicode font.
Both Unicode engines (XeTeX and LuaTeX) provide the primitive \Umathcode
for setting the extended math codes required for this to work. Details are available in both the XeTeX and LuaTeX manual: the syntax is
\Umathcode ⟨char slot⟩ [=] ⟨math type⟩ ⟨fam.⟩ ⟨glyph slot⟩
Notice that there is a requirement to supply a family here but that these will all be the same.
To set up the font dimensions required for math mode working, the engine or a suitable loader has to read the table supplied by the font. In XeTeX this happens as part of the (extended) \font
primitive, for example
\font\lmmx = "[latinmodern-math.otf]/OT:mode=base;script=math;"
whilst in LuaTeX a Lua-based loader is required to extend the \font
primitive (which out-of-the-box is identical to that in TeX90) (Realistically the font loader to use with LuaTeX is luaotfload
, which is based on that written for ConTeXt but loadable with plain, LaTeX, _etc. There is work ongoing to use the HarfBuzz shaper with LuaTeX but this is not at present usable to my knowledge.)
As only one font is in use, conversion between input and output glyphs requires some differences from classical TeX. For example, input such as
$y = mx + c$
will not give italic letters unless they have the correct \Umathcode
to point to the 'correct' codepoint. For example, we need
\Umathcode `\y = "7 "1 "1D466
(I'm assuming that we will use font 1 for all glyphs: this is not required.)
Operators in Unicode math are scaled by the font shaper directly rather than needing extensible parts. As such, something like \int
is defined for Unicode use by
\let\int=∫
with the correct math code then chosen
\Umathcode `∫= "1 "1 `∫
Both XeTeX and LuaTeX have the \Uradical
primitive for radicals: LuaTeX also has \Uroot
.
An important consequence of using only one font is that for example making symbols bold requires that all of the relevant math codes change. Thus setting up something \bf
requires that we map over all code points affected and alter their \Umathcode
.
Whilst only one font is required, it is necessary to define math families two and three to satisfy the engine that sufficient math parameters are available. (This may change, certainly in LuaTeX, as it seems to be a hold-over of code paths from TeX90.) At the same time, script fonts need to be loaded telling the loader what they are. This leads to a minimal font loading set up something like
\font\lmmx = "[latinmodern-math.otf]/OT:mode=base;script=math;" %
\font\lmmvii = "[latinmodern-math.otf]/OT:mode=base;script=math;+ssty=0;" at 7pt %
\font\lmmv = "[latinmodern-math.otf]/OT:mode=base;script=math;+ssty=1;" at 5pt %
\textfont1 = \lmmx
\textfont2 = \lmmx
\textfont3 = \lmmx
\scriptfont1 = \lmmvii
\scriptfont2 = \lmmvii
\scriptfont3 = \lmmvii
\scriptscriptfont1 = \lmmv
\scriptscriptfont2 = \lmmv
\scriptscriptfont3 = \lmmv
(Again, I am assuming XeTeX font syntax here.)
As noted in comments, there are a large number of additional font dimensions in Unicode math fonts. LuaTeX gives these names (all listed in the LuaTeX manual), whilst for XeTeX they have numbers and are accessed using \fontdimen
.
The TeX90 primitives \delimiter
, \mathaccent
and \radical
all have extended Unicode versions: \Udelimiter
, \Umathaccent
and \Uradical
. Unlike the TeX90 versions, \Udelimiter
and \Uradical
do not need to point to multiple glyph slots: only one slot is needed and the font shaper is responsible for growing the glyph as required. The syntax of \Umathaccent
is significantly extended compared to \mathaccent
, certainly for LuaTeX. All three primitives are described in the LuaTeX manual and to a lesser extend in the XeTeX one.
Best Answer
I think that rather than look at
unicode-math
it's more natural to look atinputenc
's utf8 support (if your main interest is textual accented characters rather than math symbols). That maps Unicode input to classic latex markup.The base latex distribution has a file
utf8enc.dfu
that contains the mapping as far as it is implemented. As memory constraints are a lot less of a problem now than they were initially gradually more Unicode characters have been added, and quite a lot more are added in the next release due in a day or so, the current development version isWhich tells you that for example
\^{u}
is Unicode U+00FB