I'm considering moving to Xe(La)TeX for the main reason that it allows me to use unicode in my LaTeX code, making said code easier to read, especially the math.
But I'm confused on how XeTeX typesets symbols based on my input. I see three possibilities:
- Unicode chars are made active and XeTeX outputs the TeX symbols we know and trust.
- Unicode chars are piped (directly) through to the final (pdf, ps, dvi) document.
- A combination of the two.
With one of the advertised points of XeTeX being direct access to system UTF-8 fonts, I'm guessing (2) has a lot to do with it.
Beautiful Typesetting
But TeX has always been about beautiful typesetting, and a lot of effort has gone into making symbols and their spacing look good. Do we still get the same benefits with unicode output? (Do the symbols look the same? Nicer? Worse?) I believe some symbols do not come directly from a font, but have been painstakingly crafted in TeX itself.
Specifics
There are specific math constructs that get special treatment in TeX but also have unicode symbols. For example, for \cap
and \bigcap
there are respectively ∩ and ⋂. Do they both behave accordingly? What about √? Or are there packages that implement this sort of thing?
Are most unicode math symbols interpreted correctly with regard to math spacing? (\mathbin
, \mathrel
, \mathop
, \mathopen
, \mathclose
)
Do math delimiters derived from unicode ⦃⦄ ⦅⦆ scale vertically as they should?
Are Combining Diacritical Marks handled appropriately?
Portability
Will the output be different when the code is compiled on different systems? Will my generated pdf/ps/dvi look different when viewed on different systems? Or are all relevant fonts automatically included?
unicode-math
Finally, what role does unicode-math
play in this story?
Best Answer
XeTeX introduced new primitives such as
\Umathcode
(up to version 0.9998 called\XeTeXmathcode
, renamed for compatibility with LuaTeX) that's the Unicode analog of\mathcode
.What does
\mathcode
in traditional TeX? A declaration such astells TeX that a
+
in math mode should be treated as a binary operation symbol (leftmost byte"2
), taken from font family"0
and slot"2B
in the corresponding font. In the same vein, one can say something likeor even
The primitive
\Umathcode
has the syntaxAfter the (optional)
=
, three numbers should be given, because packing the information into a single number as done by TeX is not possible. Actually the information is still packed into a single number (in this case it's decimal 18883089, hexadecimal"1202211
), but the translation from packed number to explicit type-family-slot is not straightforward.This will be probably accompanied by a similar declaration
so that typing
$∑$
or$\sum$
will give the same result.The
unicode-math
package loads a huge list of symbols and performs assignments similar to the one for∑
. The number corresponding to∑
will be different, because it depends on many aspects which can't be covered in a short answer.Actually
unicode-math
does much more than this, because it sets things up so that commands such as\mathbf
or\mathrm
give the desired result.There are other primitives corresponding to the traditional ones, namely
\Umathchar
, for using a directly specified character, or\Udelimiter
for setting delimiters with normal and large variant,\Umathaccent
and finally\Uradical
for defining root symbols. Seetexdoc xetex
that will open “The XeTeX reference guide” by Will Robertson and Khaled Hosny.