I'm having trouble getting the Unicode variant selector character working.
The Unicode range for Burmese also covers a number of Tai languages like Shan and Aiton. In Aiton, there is a very slight change to the letter forms, handled by the u+FE00 character. U+1000 (က) is the consonant /k/, and U+1000 U+FE00 (က︀) is the variant form, with a tiny dot added on the bottom left.
I have text, in TexPad on macOS, which in the editor view shows the dots (i.e. the variant selector character is working), but then when I compile (XeLaTeX), it shows up without the dot.
I'm using Noto Sans Myanmar, which does normally show the dots.
MWE:
\documentclass[12pt]{article}
\usepackage{fontspec}
\newfontfamily\myanmar{Noto Sans Myanmar}
% \newfontfamily\myanmar{Noto Sans Myanmar}[Variant=1] makes no difference
\begin{document}
% \addfontfeature{Variant=01} also has no effect
% with dots
{\myanmar က︀ိင︀်ꩫ︀ႃလ︀ႃ}
% no dots
{\myanmar ကိင်ꩫႃလႃ}
\end{document}
Here's a screenshot of the editor view, showing the difference in case it doesn't show up on SE.
This is the output of the MWE, which does not use the variants as needed:
Converting to codepoints explicitly, e.g. \symbol{"1000}\symbol{"fe00}\symbol{"102d}...
breaks it further and none of the diacritics show up properly. Same with \char"1000
etc.
Any ideas on how to resolve this?
Best Answer
Amended answer
With feedback from the comments, the
Script=
option is enough to activate the variation selectors (which is a glyph, VS1, sitting at U+FE00); and language setting has no effect (in this case).Plus,
fontspec
also allows renaming of script tags (and other things):\newfontscript{Phake}{mym2}
.So you can write things like
A recent Lualatex install does not need the
Renderer=HarfBuzz
part.MWE
Xelatex
A couple of hundred glyphs are up in the Private Use Area and all of them have Unicode value = -1 ("bad character"), so accessing them the traditional way (with
\UcharNNNN
, whereNNNN
is the Unicode codepoint) doesn't work: neither lualatex nor xelatex can reach them that way.In Xelatex, there are XeTeX primitives which allow glyph access by their position number in the font.
A TecKit mapping file, orexpl3's regex/find-replace ability, will allow a convenient transliteration ASCII input method to work. TecKit mapping uses Unicode codepoints.MWE
Lualatex
For reference, glyphs can be accessed by index/key in Lualatex too, using lua code.
Direct access to alternate glyphs, instead of indirectly via font features, variation selectors, etc, can be done using luatex's font cache information and not using HarfBuzz as font renderer. (HarfBuzz renders all glyphs the "normal" way, via ligature tables, substitution tables, etc. - all this information is in the font cache file, too.)
Luatex knows about fonts via its font cache, a set of files containing font information in lua-table structure.
In this case, we can load the
.../texlive/2020/texmf-var/luatex-cache/generic/fonts/otl/notoserifmyanmar-regular.lua
file with therequire
command and iterate through, say, thedescriptions
table and its subtables, to get at this type of entry:where
[983058]
is the "internal key", the slot number or Unicode codepoint of the glyph in the font,["index"]
is the glyph ID (GID),["name"]
is obvious, and["unicode"]
is the Unicode codepoint that this glyph relates to (glyph "ka" at4096
). Glyph "ka_khm"'s unicode will be mapped to-1
by the time HarfBuzz sees it with OpenType eyes, so not directly accessible that way.(The glyph is indirectly accessible via, say, its ligature bindings:
so no problems there.)
The trick for direct access to these substitutes and alternates is not to use HarfBuzz as the font-renderer.
Some glyphs will be positioned awkwardly with this method since it places them standalone and therefore out of context.
MWE
Original answer
From a hint on the github link, the dots are stored under
StylisticSet=1
(ss01
).They are accessed via the script option (
Script=Myanmar
).Lualatex has to be explicitly told to use the HarfBuzz font-renderer. In Xelatex, it should be the default.
MWE