This is more a proof-of-concept than a real bulletproof style file, but it does what you request:
The style file (filterltx.sty
)
\ProvidesPackage{filterltx}
\RequirePackage{luatexbase,luacode}
\begin{luacode*}
do
local replace = {}
local filter = function ( buf )
local start,stop,init,pos
local positions = {}
for k,v in pairs(replace) do
local init = 1
repeat
start,stop = string.find(string.lower(buf),k,init,true)
if start then
init = stop
pos = string.find(v,"|*|",1,plain)
positions[#positions + 1] = pos + start - 2
end
until start == nil
end
table.sort(positions)
for i = #positions,1,-1 do
buf = string.sub(buf,1,positions[i] ) .. [[\penalty10000\discretionary{-}{}{\kern.03em}\nobreak \hskip 0pt plus0pt minus0pt]] .. string.sub(buf, positions[i] + 1)
end
return buf
end
function enablefilter()
luatexbase.add_to_callback('process_input_buffer', filter, 'filter')
end
function disablefilter()
luatexbase.remove_from_callback('process_input_buffer', 'filter')
end
function translateinput( arg1,arg2 )
replace[arg1] = arg2
end
end
\end{luacode*}
\newcommand\enableinputtranslation{
\directlua{enablefilter()}
}
\newcommand\disableinputtranslation{
\directlua{disablefilter()}
}
\newcommand\translateinput[2]{
\directlua{translateinput("\luatexluaescapestring{#1}","\luatexluaescapestring{#2}")}
}
and the test document (test.tex
):
\documentclass{article}
\usepackage{filterltx}
\translateinput{shelfful}{shelf|*|ful}
\translateinput{selfish}{self|*|ish}
\translateinput{halflife}{half|*|life}
\translateinput{cufflink}{cuff|*|link}
\begin{document}
Ligatures not disabled:\\
shelfful selfish halflife cufflink
\medskip
\enableinputtranslation
Ligatures disabled:\\
shelfful selfish halflife cufflink\\
Shelfful Selfish Halflife Cufflink
\medskip
\disableinputtranslation
Ligatures not disabled:\\
shelfful selfish halflife cufflink
% to make sure the words still hyphenate:
% \showhyphens{shelfful selfish halflife cufflink}
% yields: shelf- ful self- ish half- life cuff- link
\end{document}
Run with lualatex test
.
The output:
(I guess I could be called a member of one of the teams ;-) this is my view)
I thought of staying out of this debate, but perhaps some words of clarification or, let's say, some thoughts are in order after all.
LaTeX3 versus pure Lua
First of all this is the wrong question imho: LaTeX3 has different goals to LuaTeX and those goals may well be still a defunct pipe dream, but if so they are unlikely to be resolved by pure Lua either.
So if one wants to develop an argument along those lines then it should be more like "Why does LaTeX3 use an underlying programming language based on eTeX and not on LuaTeX where a lot of functionality would be available in a "simpler" way?"
But LaTeX3 is really about three to four different levels
- underlying engine
- programming level
- typesetting element layer
- designer interface foundation layer
- document representation layer
See for example my talk at TUG 2011: http://www.latex-project.org/papers/
Here is a sketch of the architecture structure:
The chosen underlying engine (at the moment is any TeX engine with e-TeX extension). The programming level is what we call "expl3" and that is what I guess you are referring to if you say "LaTeX3" (and I sometimes do that too). However, it is only the bottom box in the above diagram. The more interesting parts are those that are above the programming level (and largely a pipe dream but moving nicely along now that the foundation on the programming level is stable). And for this part there is no comparison against Lua.
Why use LaTeX3 programming over Lua, when LuaTeX is available?
To build the upper layers it is extremely important to have a stable underlying base. As @egreg mentioned in chat: compare the package situation in 2.09 to the package situation in 2e. The moment there were standard protocols to build and interface packages the productivity increased a lot. However, the underlying programming level in 2e was and is still a mess which made a lot of things very complicated and often impossible to do in a reliable manner. Thus the need for a better underlying programming layer.
However, that programming layer is build on eTeX not because eTeX is the superior engine (it is compared to TeX but not with respect to other extensions, be it Lua or some other engine) but because it is a stable base available everywhere.
So eTeX + expl3 is a programming layer that the LaTeX3 team can rely on of not being further developed and changed (other than by us). It is also a base that is immediately available to everybody in the TeX world with the same level of functionality as all engines in use are implementing eTeX extensions.
Any larger level of modifications/improvements in the underlying engine is a hindrance to build the upper layers. True, some things may not work and some things may be more complicated to solve but the tasks we are looking at (well I am) the majority are very much independent of that layer anyway.
To make a few examples:
good algorithms for automatically doing complex page layout aren't there (as algorithms) so Lua will not help you here unless somebody comes up with such algorithms first.
something like "coffins", is thinking about approaching "design" and the importance here is how to think about it, not how to implement it (that comes second) -- see Is there no easier way to float objects and set margins? or LaTeX3 and pauper's coffins for examples
Having said that, the moment LuaTeX would be stable similar to eTeX (or a subset of LuaTeX at least) there might well good reasons for replacing the underlying engine and the program layer implementation. But it is not the focus (for now).
Why use LaTeX3 to program at all? Why not turn its development into the development of a LaTeX-style document design library for LuaTeX, written in Lua?
Could happen. But only if that "LuaTeX" would no longer be a moving target (because LaTeX3 on top would be moving target enough).
Side remark: @PatrickGundlach in his answer speculated that this
answer that the LaTeX3 goal is backwards compatibility. Wrong. The
same people that are coming down on you very strong about
compatibility for LaTeX2e have a different mindset here. We do not
believe that the interesting open questions that couldn't get resolved
for 2e could be resolved in any form or shape with LaTeX3 in a
document-compatible manner.
Input compatible: probably. But output compatibility for old
documents, no chance if you want to get anything right.
But in any case, this is not an argument for or against implementing the ideas we are working on one day with a LuaTeX engine.
Is the separation of LuaTeX and LaTeX3 a result (or at least an artifact) of the non-communication among developers that Ahmed Musa described in his comment to this answer? What kind of cooperation is there between these two projects to reduce duplication of effort?
As I tried to explain, there is not much overlap in the first place. There is much more overlap in conceptual ideas on the level ConTeXt viz. LaTeX.
An even more fantastical notion is to implement every primitive, except \directlua itself, in terms of Lua and various internal typesetting parameters, thus completely divorcing the programming side of TeX from the typesetting side.
That brings us to a completely different level of discussion, namely is based on LuaTeX, or anything else for that matter, a completely different approach to a typesetting engine possible? That is a very interesting thought, but as @Patrick explained it isn't done with leaving TeX to do the typesetting and do everything else in a different language. So far such concepts have failed whether it was NTS or anything else because fundamentally (in my believe) we haven't yet grasped how to come up with a successful and different model for the TeX macro approach (as ugly as it might look in places).
Best Answer
A full answer here has several parts. First, at the time of writing it's important to bear in mind that ConTeXt is not only available but works well, while LaTeX3 is a concept which is being developed. That means that it's not even 100% clear what shape LaTeX3 will take. It's also not clear that LaTeX3 will deliver, but for the purposes of answering the question I'll ignore this! I'll also highlight what seem to be (broad) similarities.
TeX-based systems
There are then two broad areas to talk about: the user 'experience' and the implementation. In both areas, there are differences but I'd like to highlight one important similarity: both ConTeXt and LaTeX3 are ultimately TeX-based. A radically-different approach from either would be to parse input using another language (Python is often highlighted, for its scripting ability), then convert to TeX primitives (plus low-level macros), and only do the real typesetting in TeX. Neither ConTeXt nor LaTeX3 do that.
At the user level
At the user level, LaTeX works with the concept of a document class as a key concept, .i.e. you always have:
In LaTeX2e, the separation between a class and adding code is somewhat diffuse. The idea for LaTeX3 is to make 'design' and 'code' separate areas, and so have the document class as a purely design concept. ConTeXt does not enforce the idea of a loading a particular 'style' for a document in the same way (although it is possible to load a module to set defaults). There is a key philosophical difference here, as ConTeXt is in many ways closer to the plain TeX concept of 'author as designer', while LaTeX3 is intended to enhance the separation of the two roles.
An area where there is clear similarity is that LaTeX3 will make a lot more use of keyval input 'out of the box' than LaTeX2e does. This is very much a similarity to ConTeXt, which makes extensive use of keyval. There are, however, differences in implementation (the classic one is that LaTeX keyval input skips spaces around the
=
, while ConTeXt does not).Another similarity in this area is that the scope of 'core' LaTeX3 supported ideas is intended to be much broader than 'core' LaTeX2e ones, and thus similar to what ConTeXt manages. Quite how this works out depends on the development of LaTeX3, do it is not possible at this stage to give a more detailed analysis of this area. This area encompasses the 'limitations of LaTeX2e' part of the question. For example, ConTeXt can do proper grid typesetting, which is a significant challenge in LaTeX2e.
Implementation
At the implementation level, ConTeXt (Mark IV) uses a mix of TeX and Lua. On the other hand, LaTeX3 is (currently) dependent on e-TeX plus the
\pdfstrcmp
macro (or equivalent functionality), and thus works with suitably recent versions of pdfTeX, XeTeX and LuaTeX. LaTeX3 then constructs a programming language of its own ('expl3') using the require TeX primitives. This is clearly a fundamental different, as Lua provides ConTeXt with a flexible programming system and also access to TeX internals that are not available using primitives. At the time of writing, it's not clear how LaTeX3 might handle 'LuaTeX-only' ideas: might there be team-supported 'LuaTeX-only' modules, for example?The fact that LaTeX3 uses only TeX, whereas ConTeXt uses a large amount of Lua, leads on to the fact that LaTeX3, like LaTeX2e, is intended to be a TeX format which can be used in a 'classical' manner
ConTeXt, in contrast, is a more 'dynamic' assembly as the Lua part is not saved into the format file. Thus ConTeXt is always executed using the
context
script. Using this script-based wrapper, ConTeXt can deal with multiple TeX runs, indexing and so forth 'automagically'. The intention (at present) is that LaTeX3 will work in the same way as LaTeX2e: one LaTeX run = one TeX run.Documenting interfaces
ConTeXt is build explicitly on TeX and Lua, while LaTeX3 defines its own language, expl3. Thus programming ConTeXt means programming in TeX, which is not documented formally in the ConTeXt manuals. A key driver behind LaTeX3 is the idea that beyond the kernel, everything needed to program LaTeX3 should be documented in the LaTeX3 documentation.
LaTeX3 is also aiming for a clear separation between user functions and internal functions, i.e. for every document-level function
\foo
there should be a (documented) internal function\int_foo
. ConTeXt has a very rich set of interfaces, but is not (to my knowledge) built on quite such strict 'two-layer' principles.