Unicode variant selector (u+FE00) not working with Noto Sans Myanmar (Burmese)

fontspecunicodevariantxetex

I'm having trouble getting the Unicode variant selector character working.

The Unicode range for Burmese also covers a number of Tai languages like Shan and Aiton. In Aiton, there is a very slight change to the letter forms, handled by the u+FE00 character. U+1000 (က) is the consonant /k/, and U+1000 U+FE00 (က︀) is the variant form, with a tiny dot added on the bottom left.

I have text, in TexPad on macOS, which in the editor view shows the dots (i.e. the variant selector character is working), but then when I compile (XeLaTeX), it shows up without the dot.

I'm using Noto Sans Myanmar, which does normally show the dots.

MWE:

\documentclass[12pt]{article}

\usepackage{fontspec}
\newfontfamily\myanmar{Noto Sans Myanmar}
% \newfontfamily\myanmar{Noto Sans Myanmar}[Variant=1] makes no difference

\begin{document}

% \addfontfeature{Variant=01} also has no effect

% with dots
{\myanmar က︀ိင︀်ꩫ︀ႃလ︀ႃ}

% no dots
{\myanmar ကိင်ꩫႃလႃ}

\end{document}

Here's a screenshot of the editor view, showing the difference in case it doesn't show up on SE.

enter image description here

This is the output of the MWE, which does not use the variants as needed:

enter image description here

Converting to codepoints explicitly, e.g. \symbol{"1000}\symbol{"fe00}\symbol{"102d}... breaks it further and none of the diacritics show up properly. Same with \char"1000 etc.

Any ideas on how to resolve this?

Best Answer

Amended answer

With feedback from the comments, the Script= option is enough to activate the variation selectors (which is a glyph, VS1, sitting at U+FE00); and language setting has no effect (in this case).

with dots

Plus, fontspec also allows renaming of script tags (and other things): \newfontscript{Phake}{mym2}.

So you can write things like

\newfontfamily\myanmara{Noto Serif Myanmar}[Colour=blue,Renderer=HarfBuzz,Script=Phake]

A recent Lualatex install does not need the Renderer=HarfBuzz part.

MWE

\documentclass[12pt]{article}
\usepackage[table]{xcolor}
\usepackage{fontspec}
\usepackage{xparse}% not neeeded for up-to-date distr
\ExplSyntaxOn
    % a handy command
    \cs_set_eq:NN \fontfeature \fontspec_if_feature:nTF
\ExplSyntaxOff
\newcommand\showyes{{\usefont{T1}{lmr}{m}{n}Yes}}
\newcommand\showno{{\usefont{T1}{lmr}{m}{n}No}}

\newfontscript{Phake}{mym2}
\newfontlanguage{Karen}{KSW}
\newfontlanguage{Mon}{MON}
\newfontfamily\myanmara{Noto Serif Myanmar}[Colour=blue,Renderer=HarfBuzz,Script=Phake]
\newfontfamily\myanmarb{Noto Serif Myanmar}[Colour=red,Renderer=HarfBuzz,Script=Phake,Language=Karen]
\newfontfamily\myanmarc{Noto Serif Myanmar}[Colour=brown,Renderer=HarfBuzz,Script=Phake,Language=Mon]

\begin{document}

Has \verb|ss01|: {\myanmara \fontfeature{ss01}{\showyes}{\showno}}

%with dots (with variation selectors) (using Phake=mym2 script)

\bigskip
\begin{tabular}{ll}
\rowcolor{blue!15}
Language & Result \\
\hline
None & \myanmara က︀ိင︀်ꩫ︀ႃလ︀ႃ \\
Karen & \myanmarb က︀ိင︀်ꩫ︀ႃလ︀ႃ \\
Mon & \myanmarc က︀ိင︀်ꩫ︀ႃလ︀ႃ \\
\hline
\end{tabular}

%\bigskip
%with no dots
%
%{\myanmarb ကိင်ꩫႃလႃ}

\end{document}


Xelatex

xelatx

A couple of hundred glyphs are up in the Private Use Area and all of them have Unicode value = -1 ("bad character"), so accessing them the traditional way (with \UcharNNNN, where NNNN is the Unicode codepoint) doesn't work: neither lualatex nor xelatex can reach them that way.

In Xelatex, there are XeTeX primitives which allow glyph access by their position number in the font.

A TecKit mapping file, or expl3's regex/find-replace ability, will allow a convenient transliteration ASCII input method to work. TecKit mapping uses Unicode codepoints.

MWE

\documentclass[12pt]{article}
\usepackage{multicol}
\usepackage{xcolor}
\usepackage{fontspec}

\newcommand\tl[1]{{\usefont{T1}{lmtt}{m}{n}\small#1}}
\font\myanmara="[NotoSerifMyanmar-Regular.ttf]'' at 12pt


\begin{document}

\myanmara 
က︀ိင︀်ꩫ︀ႃလ︀ႃ 

\tl{glyph count = }
\count255=\XeTeXcountglyphs\myanmara\relax
\tl{\the\count255}

\myanmara\XeTeXglyph130\XeTeXglyph384\XeTeXglyph134\XeTeXglyph406\XeTeXglyph146\XeTeXglyph373\XeTeXglyph150\XeTeXglyph373


\begin{multicols}{5}
\count255=1
\loop
\ifnum\count255 < \XeTeXcountglyphs\myanmara
\noindent
\centering
{\tl{\the\count255=}}
{\color{blue}\large\myanmara\XeTeXglyph\count255}
\par\noindent\centering
\tl{\tiny[\XeTeXglyphname\myanmara\count255]}\par
\ \par\hrule\par\ \par
\advance\count255 by 1
\repeat
\end{multicols}

\end{document}

Lualatex

For reference, glyphs can be accessed by index/key in Lualatex too, using lua code.

Direct access to alternate glyphs, instead of indirectly via font features, variation selectors, etc, can be done using luatex's font cache information and not using HarfBuzz as font renderer. (HarfBuzz renders all glyphs the "normal" way, via ligature tables, substitution tables, etc. - all this information is in the font cache file, too.)

Luatex knows about fonts via its font cache, a set of files containing font information in lua-table structure.

In this case, we can load the .../texlive/2020/texmf-var/luatex-cache/generic/fonts/otl/notoserifmyanmar-regular.lua file with the require command and iterate through, say, the descriptions table and its subtables, to get at this type of entry:

 ["descriptions"]={
...

  [983058]={
   ["boundingbox"]=13,
   ["index"]=130,
   ["name"]="ka_khm",
   ["unicode"]=4096,
   ["width"]=1037,
  },
...
}

where [983058] is the "internal key", the slot number or Unicode codepoint of the glyph in the font, ["index"] is the glyph ID (GID), ["name"] is obvious, and ["unicode"] is the Unicode codepoint that this glyph relates to (glyph "ka" at 4096). Glyph "ka_khm"'s unicode will be mapped to -1 by the time HarfBuzz sees it with OpenType eyes, so not directly accessible that way.

(The glyph is indirectly accessible via, say, its ligature bindings:

    ["features"]=316,
    ["flags"]=117,
    ["index"]=13,
    ["name"]="s_s_14",
    ["nofsteps"]=1,
    ["order"]=139,
    ["steps"]={
     {
      ["coverage"]={
       [4096]={
        [65024]={
         ["ligature"]=983058,
        },

so no problems there.)

The trick for direct access to these substitutes and alternates is not to use HarfBuzz as the font-renderer.

all glyphs

Some glyphs will be positioned awkwardly with this method since it places them standalone and therefore out of context.

MWE

\documentclass{article}
\usepackage[table]{xcolor}
\usepackage{fontspec}
\usepackage{luacode}
\usepackage{multicol}
\usepackage{xparse}

\setlength{\columnsep}{0.3cm} \setlength{\columnseprule}{1pt}

\newfontfamily\flab{Noto Serif}% for labels
\newfontfamily\fmyan{NotoSerifMyanmar-Regular.ttf}

\let\nc\newcommand

%--------------------------------------
\begin{luacode}
    function openmyfontfile(xxx)
        local fft = require(xxx)
  
        tex.sprint("\\begin{multicols}{2}")

            for k, v in pairs(fft.descriptions) do
        
            for kk, vv in pairs(fft.descriptions[k]) do
        if kk == "name" then 
                    tex.sprint("\\noindent\\begin{tabular}{|l|l|c|}")
                    tex.sprint("\\hline")
                    tex.sprint("{\\flab\\small " .. k .. "}&")
                    tex.sprint("{\\flab\\small \\detokenize{"  .. vv .. "}}&")
                    tex.sprint("\\cellcolor{blue!12}\\Uchar" .. k .."\\\\")         
                    tex.sprint("\\hline")
                    tex.sprint("\\end{tabular}\\par")
            end
            end
            end
        
            tex.sprint("\\end{multicols}")

 end
\end{luacode}
\nc\doopenff[1]{%
{\directlua{openmyfontfile("#1")}}
}

%===============

\begin{document}
\fmyan
\doopenff{<..long path name..>/texlive/2020/texmf-var/luatex-cache/generic/fonts/otl/notoserifmyanmar-regular.lua}

\end{document}

Original answer

with dots

From a hint on the github link, the dots are stored under StylisticSet=1 (ss01).

They are accessed via the script option (Script=Myanmar).

Lualatex has to be explicitly told to use the HarfBuzz font-renderer. In Xelatex, it should be the default.

MWE

\documentclass[12pt]{article}

\usepackage{fontspec}
\newfontfamily\myanmar{Noto Serif Myanmar}[StylisticSet=1,Renderer=HarfBuzz,Script=Myanmar]
\newfontfamily\myanmarsans{Noto Sans Myanmar}[StylisticSet=1,Renderer=HarfBuzz,Script=Myanmar]

\begin{document}
Serif:

% with dots
{\myanmar က︀ိင︀်ꩫ︀ႃလ︀ႃ}

% no dots
{\myanmar ကိင်ꩫႃလႃ}

\bigskip
Sans Serif:

% with dots
{\myanmarsans က︀ိင︀်ꩫ︀ႃလ︀ႃ}

% no dots
{\myanmarsans ကိင်ꩫႃလႃ}

\end{document}
Related Question