Consider the following MWE:
% !TEX TS-program = lualatex
\documentclass{article}
\usepackage{fontspec}
\setmainfont[Ligatures=NoCommon]{Latin Modern Roman}
\usepackage{luacode,luatexbase}
\begin{luacode}
function dosub ( s )
s = string.gsub ( s , 'ff', '\\char64256{}')
return ( s )
end
--luatexbase.add_to_callback ( "process_input_buffer", dosub, "dosub" )
\end{luacode}
\begin{document}
off \directlua{ tex.sprint ( dosub ( \luastring{off} ) ) } off
\end{document}
The heart of the code is the function dosub
, which employs the Lua function string.gsub
. It is set to replace instances of ff
with the glyph that contains the ff
ligature. (You'll have to trust me that, for the font at hand, the ff-ligature glyph is located in "slot" 64256.) Note that, for now, the instruction luatexbase.add_to_callback
instruction is commented out. (A --
(double dash) string initiates a Lua comment.)
When this MWE is run, one gets:
Observe that the middle word, which is generated via a \directlua
call to dosub
, correctly contains the ff
-ligature, whereas the first and third words do not (once again correctly, since automatic ligature generation is disabled).
The trouble starts when I uncomment the instruction
luatexbase.add_to_callback ( "process_input_buffer", dosub, "dosub" )
Upon recompiling, the following, fairly incomprehensible, error message results:
(/usr/local/texlive/2015/texmf-dist/tex/context/base/supp-pdf.mkii
[Loading MPS to PDF converter (version 2006.09.02).]
\scratchcounter=\count290
\scratchdimen=\dimen261
\scratchbox=\box256
! Missing number, treated as zero.
\let
l.275 \let
\pdflastform=\pdflastxform
?
I suspect this is somewhat related to the presence of a TeX macro — \char
— in the replacement string part of the string.gsub
function. To wit, if I replace '\\char64256{}'
with gg
(i.e., a constant string), no error message is generated (and the three instances of "ff" in the body of the document are automatically replaced with "gg").
Do I need to "wrap" or "protect" the TeX macro in some special way in order to enable the successful use of "luatexbase.add_to_callback"? Is there something else I should do? About my computing setup: I'm running MacTeX2015 (with all available updates thru this morning applied) on a MacBookPro running MacOSX 10.10.5 "Yosemite".
Best Answer
There is function
unicode.utf8.char
for direct unicode character inserting in Lua functions:But the main issue in your code is that the callback is inserted too early and it probably replaces
ff
chars in some macros loaded in\AtBeginDocument
. So other solution is to insert the callback in\AtBeginDocument
as well, which reduces the risk of such collision (you should do that even in the first method):Edit:
There is also another catch, what if your document body include some macro with
ff
in a name? To fix that, we can use such function:with
s:gsub('(\\?)([%a%@]+)', function(back,text)
we catch all words, including macros. If variableback
is not empty string, the current word is a macro and we need to return it unprocessed. Otherwise, we can applyff
replacing regexp.Note that in this case
add_to_callback
is used withoutAtBeginDocument
, because when\offer
macro is defined in the preamble, it's text wouldn't be replaced. Because we now skip macros, it shouldn't matter.And as closing remarks I would add that node processing callbacks are much better for this kind of hacks, exactly because of these problems with macros.
For instance the following code:
it is more complicated, because we can't operate on string level, but on individual nodes. lot of node types exists,
glyph
nodes withnode.id
37 are important for us. every glyph node haschar
field, holding the character code. When glyph withf
character is found, we peek next nodes to find whether there is anotherf
glyph next to this one. when it is found, we replace current character with code forff
ligature and delete nextf
glyph.