[Tex/LaTex] In LuaTex is it possible to change font/language according to the script/glyphs used

fontshyphenationluatex

I am a XeLaTeX user and I often have to typeset english-greek documents. Packages like xgreek or polyglossia are great but in order to apply the correct hyphenation rules you have to declare the text that belongs to the secondary language. On the long run, this can become cumbersome.

In XeLaTeX I am using the XeTeXinterchartoks mechanism that allows me to change automatically the hyphenation rules and/or the font without explicitingly declaring them. The procedure involves grouping together the glyphs of a Unicode Block and then automatically applying tex commands when transitioning from a group to another one.

For those not familiar with the XeTeXinterchartoks, more info can be found at xetex-reference pages 13-14.

Lately I got interested in Lua(La)Tex. I want to know if there is a way to achieve similar results in LuaTex. As was pointed in a previous question, in LuaTeX there is not a direct analogue to XeTeXinterchartoks but it was suggested that there are other, more powerful ways to achieve the same goal.

So the questions are

  1. How can I change the hyphenation rules/font automatically according to the glyphs used without declaring them?
  2. Can someone provide a minimum working example or at least point me to some references?
  3. One of the drawbacks of XeTeXinterchartoks is that the settings are global. Is there a way in LuaTeX to set this feature (if it exists) on and off, thus providing more flexibility?

Best Answer

Here is a proof of concept at doing the equivalent of \XeTeXinterchartoks in luatex.

First, a style file:

    % luatexinterchartoks.sty
\newcount\XeTeXinterchartokenstate

\newcount\charclasses
\def\newXeTeXintercharclass#1%
  {\global\advance\charclasses1\relax
   \newcount#1
   \global#1=\the\charclasses
   }

\newcount\cchone
\newcount\cchtwo

\def\dodoXeTeXcharclass
    {\directlua{setcharclass(\the\cchone,\the\cchtwo)}}

\def\doXeTeXcharclass%
   {\afterassignment\dodoXeTeXcharclass\cchtwo }

\def\XeTeXcharclass%
   {\afterassignment\doXeTeXcharclass\cchone }

\protected\def\XeTeXdointerchartoks%
   {\directlua{setinterchartoks(\the\cchone,\the\cchtwo,\the\allocationnumber)}}

\protected\def\dodoXeTeXinterchartoks%
   {\newtoks\mytoks\afterassignment\XeTeXdointerchartoks\global\mytoks }

\protected\def\doXeTeXinterchartoks%
   {\afterassignment\dodoXeTeXinterchartoks\cchtwo }

\def\XeTeXinterchartoks%
   {\afterassignment\doXeTeXinterchartoks\cchone }

\luatexdirectlua{dofile('luatexinterchartoks.lua')}

\endinput

And a matching lua file:

% luatexinterchartoks.lua

charclasses = charclasses or {}

function setcharclass (a,b)
   charclasses[a] = b
end

local i = 0
while i < 65536 do
  charclasses[i]  = 0
  i = i + 1
end

interchartoks =  interchartoks or {}

function setinterchartoks (a,b,c)
   interchartoks[a] = interchartoks[a] or {}
   interchartoks[a][b] = c
end

local nc, oc
oc = 255

function do_intertoks () 
  local tok = token.get_next() 
  if tex.count['XeTeXinterchartokenstate'] == 1 then
      if tok[1] == 11 or  tok[1] == 12 then
        nc = charclasses[tok[2]] 
        newchar = tok[2]
      else 
        nc = 255
        newchar = ''
      end
      local insert  = ''
      if interchartoks[oc] and interchartoks[oc][nc] then
          insert = interchartoks[oc][nc] 
          local newtok = tok
          if insert<100 then
            local dec = math.floor(insert / 10) + 48;
            local unit = math.floor(insert % 10) + 48;
            newtok = {
              -- \XeTeXinterchartokenstate=0 \the\toks<n> \XeTeXinterchartokenstate=1
              token.create('XeTeXinterchartokenstate'),
              token.create(string.byte('='),12),
              token.create(string.byte('0'),12),
              token.create(string.byte(' '),10),
              token.create('the'),
              token.create('toks'),
              token.create(dec,12),
              token.create(unit,12),
              token.create(string.byte(' '),10),
              token.create('XeTeXinterchartokenstate'),
              token.create(string.byte('='),12),
              token.create(string.byte('1'),12),
              token.create(string.byte(' '),10),
              {tok[1], tok[2], tok[3]}}               
          end
          tok = newtok
      end
      oc = nc
  end
  return tok
end

callback.register ('token_filter', do_intertoks)

And a test document:

\documentclass{article}

\usepackage{luatexinterchartoks}
\usepackage{color}

\begin{document}

\newXeTeXintercharclass \mycharclassa
\newXeTeXintercharclass \mycharclassA
\newXeTeXintercharclass \mycharclassB
\XeTeXcharclass `\a \mycharclassa
\XeTeXcharclass `\A \mycharclassA
\XeTeXcharclass `\B \mycharclassB
% between "a" and "A":
\XeTeXinterchartoks \mycharclassa \mycharclassA = {[\itshape}
\XeTeXinterchartoks \mycharclassA \mycharclassa = {\upshape]}
% between " " and "B":
\XeTeXinterchartoks 255 \mycharclassB = {\bgroup\color{blue}}
\XeTeXinterchartoks \mycharclassB 255 = {\egroup}
% between "B" and "B":
\XeTeXinterchartoks \mycharclassB \mycharclassB = {.}

\begingroup
\XeTeXinterchartokenstate = 1
aAa A a B aBa BB
\endgroup

\end{document}

Not very pretty, but it proves that it can be done ...

Related Question