[Tex/LaTex] Is direct utf8 input of combining diacritics in math mode possible with lualatex

accentsluatexmath-modeunicodeunicode-math

I am trying to typeset a document with combining diacritics directly input in it. I use LuaLaTex. Here is a minimal example illustrating the original issue:

\documentclass{minimal}
\usepackage{unicode-math}
\setmathfont{XITS Math}
\begin{document}
$v⃗$
\end{document}

The above vector arrow (U+20D7) is completely lost in the output. In text mode it would be shown, but in math mode it was discarded from the horizontal list alltogether.

Then I tried the following:

\documentclass{minimal}
\usepackage{unicode-math}
\setmathfont{XITS Math}
{
\catcode`\_=11\relax
\catcode`\:=11\relax
\gdef\SetMathCode#1#2{\um_set_mathcode:nnn{#1}{#2}\um_symfont_tl}
}
\SetMathCode{"20D7}\mathaccent
\begin{document}
$v⃗$
\end{document}

This code essentially uses \Umathcode, indirectly through a macro in the unicode-math package. The reason is, that I found I had to change the math family of the arrow to the XITS font. The mapping of the diacritics (and possibly some of the other characters) are not set up automatically for math mode.

Now the arrow is typeset adjacently to the right of the accented v. I want it to be typeset as accent, above the v. The \vec macro and \Umathaccent work, but I want to make the formulas plain-text readable if possible. (I use the Emacs quail system for input.)

Could you please advice?

My LuaTeX version is beta-0.70.2, TeX Live 2012, LaTeX2e <2011/06/27>

XITS font is version 1.105.

Thanks in advance

Note:


Obviously the problem arises when accenting the special script-like letters.

In the end, the issue seems to be with the handling of the special top-accent glyph metric. It is supposed to be done as described in the "Math accent handling" section of the luatex manual, but in reality is done only for the \Umathaccent command, and I think forgotten for combining characters. The text version of the font uses some other mechanism with horizontal offsets (called "bearings"?), and goes around this limitation.

I will investigate this a bit further. If it is a core issue, I should file this with the LuaTeX guys. Consider the question closed. It became too specific anyway.

Best Answer

I got it working with a lua script. Your minimal example becomes:

\documentclass{minimal}
\usepackage{unicode-math}
\setmathfont{XITS Math}
\AtBeginDocument{\directlua{require("combining_preprocessor.lua")}}
\newcommand{\⃗}[1]{\ensuremath{\vec{#1}}}
\begin{document}
$v⃗$
\end{document}

The idea is that it's difficult to make LaTeX handle a command or macro that comes after its argument, which is how Unicode combining characters work, so we use would like a preprocessor to move the accent so it comes before its argument. That is, map v⃗ to \⃗{v} in a script, and then define whatever action you want \⃗ to have. (That's a backslash followed by a combining arrow, which should be printed above the backslash.)

My lua script does most (all?) of the combining characters, so you just need to define what they should do in the .tex file. Many accents on the same character is possible. Example:

\documentclass{minimal}

\usepackage{unicode-math}
\setmathfont{XITS Math}

\AtBeginDocument{\directlua{require("combining_preprocessor.lua")}}

\newcommand{\̂}[1]{\ensuremath{\hat{#1}}}
\newcommand{\⃑}[1]{\ensuremath{\vec{#1}}}
\newcommand{\̱}[1]{\ensuremath{\underline{#1}}}
\newcommand{\́}[1]{\ensuremath{\acute{#1}}}

\usepackage{stackrel}
\newcommand{\᷽}[1]{\ensuremath{\stackrel[\approx]{}{#1}}}

\begin{document}

Hello

$ℂ̂$ is hat on $ℂ$, more on $ℂ̂⃑$ (stress test)

$ℂ̂ x̂$

Many combining accents on $x᷽̱̂́⃑$ is cool.

\end{document}

(My browser doesn't do the many combining characters justice here, but it looks nice in the PDF file.)

Not sure if this is the ideal way of doing things, but for what it's worth, here is combining_preprocessor.lua:

function minornil(a, b)
   if a == nil and b == nil then
      return nil
   elseif a == nil then
      return b
   elseif b == nil then
      return a
   else
      return math.min(a, b)
   end
end

function findfirstcombining(line, n)
   local a = string.find(line, "\204[\128-\191]", n)     -- From U0300,
   local b = string.find(line, "\205[\128-\175]", n)     -- to U036F.
   a = minornil(a, b)
   b = string.find(line, "\226\131[\144-\176]", n) -- U20D0 to U20F0
   a = minornil(a, b)
   b = string.find(line, "\225\183[\128-\191]", n) -- U1DC0 to U1DFF
   a = minornil(a, b)
   return a
end

function is_utf8_continuation(byte)
   return byte < 191 and byte > 127
end

function find_next_utf8_char(str, n)
   while str:byte(n) ~= nil and is_utf8_continuation(str:byte(n)) do
      n = n + 1
   end
   return n
end

function combining_iter(str)
   local n = 0
   return function ()
      n = (n ~= nil) and findfirstcombining(str, n + 1)
      return n
   end
end

function dobuffer(line)
   local n1 = 0
   local t = {}
   for n2 in combining_iter(line) do
      if n2 > n1 then
         local n3 = n2
         repeat
            n3 = n3 - 1
         until not is_utf8_continuation(line:byte(n3))
         table.insert(t, string.sub(line, n1, n3 - 1))
         n1 = find_next_utf8_char(line, n2 + 1)
         local comb = {}
         table.insert(comb, "\\" .. string.sub(line, n2, n1 - 1) .. "{")
         table.insert(comb, string.sub(line, n3, n2 - 1) .. "}")
         n2 = findfirstcombining(line, n1)
         while n2 == n1 do
            n1 = find_next_utf8_char(line, n2 + 1)
            table.insert(comb, 1, "\\" .. line:sub(n2, n1 - 1) .. "{")
            table.insert(comb, "}")
            n2 = findfirstcombining(line, n1)
         end
         table.insert(t, table.concat(comb))
      end
   end
   table.insert(t, string.sub(line, n1))
   return table.concat(t)
end

luatexbase.add_to_callback("process_input_buffer",
                           dobuffer, "combining_preprocessor", 1)