[Tex/LaTex] How to suppress the operation of a luatex-defined macro on a string if the string is part of macro or a label

ligaturesluatex

I'm working towards creating a lualatex-based package that lets users automatically suppress the use of ligatures (for now, ff, fi, fl, ffi, ffl, and ft) for selected words. (For background see this question.) The package is set to work with both English and German language words. The MWE below, which is a very much stripped down version of the package, shows how to suppress the insertion of ligatures for four selected words — two English, two German. (The correct hyphenation of the selected words — both at the non-ligation points and potentially elsewhere in the words — is also taken care of. The little red dashes in the output image, generated by of the showhyphens package, indicate where LuaLaTeX thinks it's OK to insert hyphenation breaks.)

Here's the problem I'm trying to solve: The package's main routine (implemented as a lua callback function that operates on process_input_buffer) turns out to be way too greedy for its own good: It tries to perform string substitution operations on everything in the input buffer, including the names and arguments of TeX macros. To make the package suitable for field work, I have to find a way to prevent the main text translation macro from operating on

  • string snippets that are parts of TeX macros and on
  • the arguments of select instructions, such as \label and \ref.

(There are probably other cases where the substitution shouldn't be applied either.)

Are there any conditionals — or how might one go about creating such conditionals? — to check if a string for which a match is found is part of either an already-defined macro or an argument of a \label or \ref (or \varioref, \cref, etc) macro? Alternatively, how might one prevent outright the substitution macro from operating on (i) any TeX macros and (ii) the arguments of selected macros?

A couple of quick illustrations of these problems:

  • Suppose that there's a macro in a document named named \bookshelfful. (Not exactly likely, of course, but this is just meant to provide an example.) For such a macro, I don't want my macro operating on it, as it would end up being transformed into \bookshelf \nobreak\hskip0pt \discretionary{\char\hyphenchar\font}{}{\kern\KERN} \nobreak\hskip0pt ful. Arggh.

  • Should there be a label in the document named "thm:cufflinks" (yeah, sure!), it must not get translated into "thm:cuff\nobreak\hskip0pt \discretionary{\char\hyphenchar\font}{}{\kern\KERN} \nobreak\hskip0pt links".

Double arrgh.

% !TEX TS-program = lualatex
\documentclass[12pt]{article}
\usepackage[margin=1in]{geometry}
\usepackage[no-math]{fontspec}
   %% work around a bug in luaotfload (cf. https://tex.stackexchange.com/q/47031/5001)
\setmainfont[Renderer=Basic]{Latin Modern Roman}
\defaultfontfeatures{Ligatures={TeX,Common}}
\usepackage{showhyphens} % show all hyphenation points
\usepackage{luatexbase,luacode}

\begin{luacode*}

do
    local replace = {}

    local filter = function ( buf )
       for key, val in pairs ( replace ) do
           buf = string.gsub ( buf, key, val )
       end
       return buf
    end

    function translateinput ( arg1,arg2 )  -- with discretionary hyphen
       replace[arg1]=string.gsub(arg2,"|%*|",[[\kernandhyph ]])
    end

    function enablefilters()
       luatexbase.add_to_callback('process_input_buffer', filter, 'filter')
    end

end

\end{luacode*}

\newcommand\enableinputtranslation{ 
    \directlua{ 
        enablefilters() 
    } 
}

\newcommand{\kernandhyph}{%
   \nobreak\hskip0pt%
   \discretionary{\char\hyphenchar\font}{}{\kern\KERN}%
   \nobreak\hskip0pt%
}

\newcommand\translateinput[2]{ 
    \directlua{ 
        translateinput ( "\luatexluaescapestring{#1}",
                         "\luatexluaescapestring{#2}" ) 
    }
}

% some substitution rules
\translateinput{lfful}{lf|*|ful}    %% e.g., shelf-ful(s) bookshelf-ful(s)
\translateinput{fflink}{ff|*|link}  %%       cuff-link(s)
\translateinput{iflich}{if|*|lich}  %%       reif-lich begreif-lich tarif-lich
\translateinput{uflauf}{uf|*|lauf}  %%       auf-laufen
\translateinput{ufform}{uf|*|form}  %%       auf-formen

\newlength\KERN 
\setlength\KERN{0.07ex}  % trial value for amount of kern to be inserted

\begin{document}
shelfful cufflink unbegreiflich Auflaufform 

\quad \emph{versus} 
\enableinputtranslation  % turn on input translation

shelfful cufflink unbegreiflich Auflaufform
\end{document}

enter image description here

Best Answer

Here is my solution to this problem, which also uses the ligaturing callback (reusing lots of code from the earlier answer).

Instead of attempting to do the actual hyphenation in the processing function, my code one inserts whatsit nodes at the key spots. Those whatsit nodes then prohibit ligature building at those spots.

\documentclass{article}
\usepackage{fontspec}
\usepackage{luacode}%,luatexbase}
\setmainfont[Renderer=Basic]{Latin Modern Roman}
%\defaultfontfeatures{Ligatures={TeX,NoCommon}}
%\setmainfont{Linux Libertine O}
\usepackage[margin=1cm]{geometry}
\begin{luacode}
local glyph = node.id('glyph')
local glue = node.id("glue")
local whatsit = node.id("whatsit")
local userdefined
for n,v in pairs(node.whatsits()) do
  if v == 'user_defined' then userdefined = n end
end
local identifier = 123456  -- any unique identifier 
local noliga={}
debug=false
function debug_info(s)
  if debug then
    texio.write_nl(s)
  end
end
local blocknode = node.new(whatsit, userdefined)
blocknode.type = 100
blocknode.user_id = identifier

function process_ligatures(nodes,tail)
  local s={}
  local current_node=nodes--node.copy(nodes)
  local build_liga_table =  function(strlen,t)
    local p={}
    for i = 1, strlen do
      p[i]=0
    end
    for k,v in pairs(t) do
      debug_info("Match: "..v[3])
      local c= string.find(noliga[v[3]],"|")
      local correction=1
      while c~=nil do
         debug_info("Position "..(v[1]+c))
         p[v[1]+c-correction] = 1
         c = string.find(noliga[v[3]],"|",c+1)  
         correction=correction+1
      end   
    end
    debug_info("Liga table: "..table.concat(p, ""))
    return p
  end
  local apply_ligatures=function(head,ligatures)
     local i=1
     local hh=head
     local last=node.tail(head)
     for curr in node.traverse_id(glyph,head) do
       if ligatures[i]==1 then
         debug_info("Current glyph: "..unicode.utf8.char(curr.char))
         node.insert_before(hh,curr, node.copy(blocknode))
         hh=curr
       end 
       last=curr
       if i==#ligatures then 
         debug_info("Leave node list on position: "..i)
         break 
       end
       i=i+1
     end
     if(last~=nil) then
       debug_info("Last char: "..unicode.utf8.char(last.char))
     end--]]
  end
  for t in node.traverse(nodes) do
    if t.id==glyph then
      s[#s+1]=string.lower(unicode.utf8.char(t.char))
    elseif t.id== glue then
      local f=string.gsub(table.concat(s,""),"[\\?!,\\.]+","") -- add all interpunction
      local throwliga={}    
      for k, v in pairs(noliga) do
        local count=1
        local match= string.find(f,k)
        while match do
          count=match
          debug_info("pattern match: "..f .." - "..k)  
          local n = match + string.len(k)-1
          table.insert(throwliga,{match,n,k})
          match= string.find(f,k,count+1)
        end
      end
      if #throwliga==0 then 
        debug_info("No ligature substitution for: "..f)  
      else
        debug_info("Do ligature substitution for: "..f)  
        local ligabreaks=build_liga_table(f:len(),throwliga)
        apply_ligatures(current_node,ligabreaks)
      end
      s={}
      current_node=t
    end    
  end
  -- node.ligaturing(nodes) -- not needed, luaotfload does ligaturing
end
function suppress_liga(s,t)
  noliga[s]=t
end
function drop_special_nodes (nodes,tail)
  for t in node.traverse(nodes) do
     if t.id == whatsit and t.subtype == userdefined and t.user_id == identifier then
        node.remove(nodes,t)
        node.free(t)
     end
  end
end
luatexbase.add_to_callback("ligaturing", process_ligatures,"Filter ligatures", 1) 
--luatexbase.add_to_callback("ligaturing", drop_special_nodes,"Drop filter ligatures", 2) 
\end{luacode}
\newcommand\suppressligature[2]{
\directlua{
    suppress_liga("\luatexluaescapestring{#1}","\luatexluaescapestring{#2}")
}
}
\newcommand\debugon{%
\directlua{
debug=true
}
}
\begin{document}

\suppressligature{fifi}{f|ifi}
\suppressligature{grafi}{graf|i}
\suppressligature{lfful}{lf|ful} 
\suppressligature{fflink}{ff|link}
\suppressligature{iflich}{if|lich}
\suppressligature{uflauf}{uf|lauf}
\suppressligature{ufform}{uf|form}
\debugon

shelfful 
cufflink
unbegreiflich 
Auflaufform
offen

\end{document}

As you can see, the code does not do any ligaturing at all (!) as that is handled by luaotfload in the pre_linebreak_filter.

However, this also creates a minor glitch: the added whatsits also prevent kerning at those spots, but they cannot be removed here because that would re-enable the ligatures once luaotfload comes into play. I do not know enough of the internals of lualatex to fix this (minor) problem.

Related Question