[Please correct me if my statements are wrong]
TL;DR: pdftex
, xetex
and luatex
exhibit different behaviour. What are these differences and why are they present.
I know of three engines, which implement TeX and support native PDF output: pdftex
, xetex
and luatex
. Unfortunately they are not completely compatible.
pdftex
is an ɛ-TeX engine, which is downward compatible to Knuth-TeX, i.e. there will be not difference in the typeset output of pdftex
and tex
(breakpoints, pagebreaks, ligatures, dimensions, etc. are exactly the same).
Example of an Incompatibility
This is not true for xetex
and luatex
. I just know of one obvious incompatibility which I will support by a MWE in a moment. To prevent a ligature, Knuth suggest several alternatives in the TeXbook in exercise 5.1
{shelf}ful
orshelf{}ful
, etc.; or evenshelf\/ful
, which yields a shelfful (w/ ligature) instead of a shelfful (w/o ligature). In fact, the latter idea—to insert an italic correction—is preferable because TeX will reinsert the ff ligature by itself after hyphenatingshelf{}ful
. (Appendix H points out that ligatures are put into a hyphenated word that contains no “explicit kerns,” and an italic correction is an explicit kern.) But the italic correction may be too much (especially in an italic font);shelf{\kern0pt}ful
is often best.
Thus having the following plainTeX document should always result in the ff ligature being disabled
shelf{}ful
\bye
pdftex
The output of tex
and pdftex
is as expected
luatex
But for luatex
I obtain
xetex
For xetex
I get the same as for pdftex
but as soon as I load an OpenType font as in
\font\test="CMU Serif" \test
shelf{}ful
\bye
the ligature reappears
Another example (which turned out to be a bug)
The code
${}\limits$
\bye
will typeset without any complaint under luatex
whereas it will throw the error
! Limit controls must follow a math operator.
with pdftex
or xetex
.
This example will also throw the appropriate error for luatex
versions > beta-0.79.1.
Summary
In the end my question boils down to several points:
- What are the incompatibilities of
xetex
andluatex
withpdftex
? - Why did the designers choose to break compatibility with Knuth?
- Can I restore compatibility by making specific design choices (i.e. breaking ligatures with
\kern0pt
instead of{}
)?
N.B.: Please don't get too caught up by the example. I'm not asking for a solution to this specific problem. If you are searching for a solution to it, visit those links
- Difference between {} and \/ for breaking ligatures
- Ligatures and hyphenations: the effect of empty brace groups {}
List of known incompatibilities
Here I collect links to questions/answers which point out some of the incompatibilities. It is intended for users contemplating to switch engines, but want to known what to watch out for.
Best Answer
pdfTeX is intended to offer complete compatibility with Knuth's TeX, and thus if the e-TeX extensions are not enabled should act in the same way.
XeTeX is based on the e-TeX code and does not set out to break any compatibility with Knuth's TeX unless it is absolutely necessary (i.e. there is no reimplementation of algorithms unless this relates to adding new features). However, there are places that differences occur. As noted in the question, XeTeX can load system fonts. When this is done, a new approach to placing boxes on the page is used. Possibly only those with deep involvement in that code can comment on whether it was absolutely necessary not to support the
{}
approach to breaking ligatures, but as noted that doesn't always work anyway. Changes also occur where the classical TeX syntax is extended. For example, allowing more than two^
is done to allow access to the full Unicode range. However, as shown in http://wiki.contextgarden.net/Encodings_and_Regimes that leads to code which does different things in 8-bit and Unicode TeX engines:LuaTeX is a very different case. The designers have decided to revisit a number of Knuth's decisions: the LuaTeX manual covers the detail (there is quite a bit). For example, the fact that
{}
does not inhibit a ligature is deliberate and relates to how LuaTeX process the input and represents it in internal data structures. LuaTeX treats hyphenation as a property of language not of font. As such, LuaTeX can hyphenate words using different fonts if the language does not change. As a result, hyphenation is governed by per-language primitives. (LuaTeX can also hyphenate the first word of a paragraph, which Knuth's TeX does not do and which is nowadays a 'feature'.) LuaTeX also has the same issues as XeTeX in terms of extending primitives for Unicode working: see the demo above for example.Worth noting is that as LuaTeX supports callbacks functionality unchanged by the engine may be altered by Lua code. An obvious example is the
\font
primitive. This is not extended by the engine, but is by a Lua-based font loader: plain and LaTeX users share the same code here while ConTeXt has its own (related) loader.For both XeTeX and LuaTeX it is worth noting that the extension of math mode to allow a number of additional Unicode math parameters to be used (all prefixed
\Umath...
) means that math mode spacing may change if these additional data points are available, principally when using a Unicode math font.The bottom line from all of this is that if you have an 8-bit document written for pdfTeX, e-TeX or indeed TeX90 you should be able to use pdfTeX to process it unchanged. XeTeX will give the same result with almost all files of the same form assuming they don't contain any engine tests or similar, and assuming that the contain no driver-specific code (XeTeX uses the xdvipdfmx driver in all cases, pdfTeX may use dvips, dvipdfmx or direct PDF output). LuaTeX may change the behaviour of such documents, including but not limited to hyphenation, line breaking, ligature formation and so on.
Looking at the question purely in terms of primitives, we have to decide if we are comparing XeTeX and LuaTeX with TeX90, e-TeX or pdfTeX1.40. The question seems to be focussed on 'current' engines, so I will take pdfTeX 1.40 as the 'reference (it incorporates the e-TeX modifications to TeX90 plus a range of additional primitives). As noted in the part above, some behaviours are changes in XeTeX and LuaTeX. I'll note where possible any TeX90/e-TeX/pdfTeX variations which seem important in this context. Quite a bit of this information is available in the LuaTeX manual.
As XeTeX and LuaTeX allow Unicode input, and primitives which are followed by the
<number>
of a character are affected by the change:\char
\lccode
\uccode
\catcode
\sfcode
\efcode
(LuaTeX-only: see below)\lpcode
\rpcode
\chardef
These all accept the full Unicode range (up to 0x10FFFF) with the newer engines: pdfTeX like e-TeX and TeX90 allows only the 8-bit range (maximum 0xFF).
LuaTeX extends the range of registers allowed beyond that of e-TeX. Thus while pdfTeX and XeTeX allow up to 32767 box, count, dimen, muskip, marks and toks registers, LuaTeX allows a 16-bit range (max is 65535). This affects the primitives
\count
\dimen
\skip
\muskip
\marks
\toks
\countdef
\dimendef
\skipdef
\muskipdef
\toksdef
\box
\unhbox
\unvbox
\copy
\unhcopy
\unvcopy
\wd
\ht
\dp
\setbox
\vsplit
The
\font
primitive is extended by XeTeX to allow loading of system fonts with the syntaxwhere the
⟨font identifier⟩
may be given in square brackets for a file name or without with a 'friendly' (system) name. This is not the case in LuaTeX: as noted above, LuaTeX is normally used with a Lua-based font loader which modifies the primitive via a callback.LuaTeX allows file names to be given in braces as primitive sytnax, for example
This affects the primitives
\font
(note: this is purely to do with the file name of the font)\input
\openin
\openout
pdfTeX adds a number of primitives to e-TeX, some related to PDF creation, some for microtypography and some general utilities. As XeTeX is based directly on e-TeX and not on pdfTeX, it only features some of these where they have been ported across. Some of the primitive are also renamed as they are no PDF-related. Thus XeTeX includes the following concepts introduced by pdfTeX:
\lpcode
\rpcode
\pdfpageheight
\pdfpagewidth
\pdfsavepos
\pdflastxpos
\pdflastypos
\ifincsname
\ifprimitive
(\ifpdfprimitive
in pdfTeX)\primitive
(\pdfprimitive
in pdfTeX)\strcmp
(\pdfstrcmp
in pdfTeX`)\shellescape
(\pdfshellescape
in pdfTeX)\normaldeviate
(TL'19 onward,\pdfnormaldeviate
in pdfTeX)\uniformdeviate
(TL'19 onward,\pdfuniformdeviate
in pdfTeX)\randomseed
(TL'19 onward,\pdfrandomseed
in pdfTeX)\setrandomseed
(TL'19 onward,\pdfsetrandomseed
in pdfTeX)\elapsedtime
(TL'19 onward,\pdfelapsedtime
in pdfTeX)\resettimer
(TL'19 onward,\pdfresettimer
in pdfTeX)\filedump
(TL'19 onward,\pdffiledump
in pdfTeX)\filemoddate
(TL'19 onward,\pdffilemoddate
in pdfTeX)\filesize
(TL'19 onward,\pdffilesize
in pdfTeX)\mdfivesum
(TL'19 onward,\pdfmdfivesum
in pdfTeX)but not for example
\efcode
(as noted above),\pdfliteral
or many others.LuaTeX is based on pdfTeX and retains some of the primitives introduced there, renames some to remove 'pdf' and drops others. As well as primitives marked as experimental or deprecated in pdfTeX 1.40, LuaTeX also removes the primitives:
\pdfelapsedtime
\pdfescapehex
\pdfescapename
\pdfescapestring
\pdffiledump
\pdffilemoddate
\pdffilesize
\pdflastmatch
\pdfmatch
\pdfmdfivesum
\pdfresettimer
\pdfshellescape
\pdfstrcmp
\pdfunescapehex
and provides
\primitive
\ifprimitive
\ifabsnum
\ifabsdim
without 'pdf' in the name. It also moves all of the 'back end' concepts (to do with producing PDF output) to three new primitives which implement the functionality of the various PDF-related
\pdf...
primitives from pdfTeX.Currently, XeTeX and pdfTeX use the 'TeX--XeT' model for right-to-left typesetting while LuaTeX uses one derived from Omega/Aleph. As such, it does not feature the primitives
\TeXXeTstate
\beginR
\beginL
\endR
\endL
(Note that there has been suggestion that XeTeX may at some stage move from TeX--XeT to the Omega model.)
LuaTeX also alters the behaviour of
\endlinechar
and\newlinechar
: the maximum value is 127 while setting any value below zero stores -1.Both XeTeX and LuaTeX add new primitives to TeX and the behaviour of these of course requires the appropriate engine. Note in particular that new primitives for Unicode math handling (
\Umath...
) are available in both engines. The also both feature\suppressfontnotfounderror
.