[Tex/LaTex] What are the incompatibilities of pdftex, xetex and luatex

luatexpdftextex-corexetex

[Please correct me if my statements are wrong]

TL;DR: pdftex, xetex and luatex exhibit different behaviour. What are these differences and why are they present.


I know of three engines, which implement TeX and support native PDF output: pdftex, xetex and luatex. Unfortunately they are not completely compatible.

pdftex is an ɛ-TeX engine, which is downward compatible to Knuth-TeX, i.e. there will be not difference in the typeset output of pdftex and tex (breakpoints, pagebreaks, ligatures, dimensions, etc. are exactly the same).

Example of an Incompatibility

This is not true for xetex and luatex. I just know of one obvious incompatibility which I will support by a MWE in a moment. To prevent a ligature, Knuth suggest several alternatives in the TeXbook in exercise 5.1

{shelf}ful or shelf{}ful, etc.; or even shelf\/ful, which yields a shelfful (w/ ligature) instead of a shelfful (w/o ligature). In fact, the latter idea—to insert an italic correction—is preferable because TeX will reinsert the ff ligature by itself after hyphenating shelf{}ful. (Appendix H points out that ligatures are put into a hyphenated word that contains no “explicit kerns,” and an italic correction is an explicit kern.) But the italic correction may be too much (especially in an italic font); shelf{\kern0pt}ful is often best.

Thus having the following plainTeX document should always result in the ff ligature being disabled

shelf{}ful
\bye

pdftex

The output of tex and pdftex is as expected

enter image description here

luatex

But for luatex I obtain

enter image description here

xetex

For xetex I get the same as for pdftex

enter image description here

but as soon as I load an OpenType font as in

\font\test="CMU Serif" \test
shelf{}ful
\bye

the ligature reappears

enter image description here

Another example (which turned out to be a bug)

The code

${}\limits$
\bye

will typeset without any complaint under luatex whereas it will throw the error

! Limit controls must follow a math operator.

with pdftex or xetex.

This example will also throw the appropriate error for luatexversions > beta-0.79.1.

Summary

In the end my question boils down to several points:

  • What are the incompatibilities of xetex and luatex with pdftex?
  • Why did the designers choose to break compatibility with Knuth?
  • Can I restore compatibility by making specific design choices (i.e. breaking ligatures with \kern0pt instead of {})?

N.B.: Please don't get too caught up by the example. I'm not asking for a solution to this specific problem. If you are searching for a solution to it, visit those links


List of known incompatibilities

Here I collect links to questions/answers which point out some of the incompatibilities. It is intended for users contemplating to switch engines, but want to known what to watch out for.

Best Answer

pdfTeX is intended to offer complete compatibility with Knuth's TeX, and thus if the e-TeX extensions are not enabled should act in the same way.

XeTeX is based on the e-TeX code and does not set out to break any compatibility with Knuth's TeX unless it is absolutely necessary (i.e. there is no reimplementation of algorithms unless this relates to adding new features). However, there are places that differences occur. As noted in the question, XeTeX can load system fonts. When this is done, a new approach to placing boxes on the page is used. Possibly only those with deep involvement in that code can comment on whether it was absolutely necessary not to support the {} approach to breaking ligatures, but as noted that doesn't always work anyway. Changes also occur where the classical TeX syntax is extended. For example, allowing more than two ^ is done to allow access to the full Unicode range. However, as shown in http://wiki.contextgarden.net/Encodings_and_Regimes that leads to code which does different things in 8-bit and Unicode TeX engines:

\def\"{0}\expandafter\def\csname^^^^^00022\endcsname{1}
\ifnum\"=0 \message{tex82}\else\message{newstuff}\fi

LuaTeX is a very different case. The designers have decided to revisit a number of Knuth's decisions: the LuaTeX manual covers the detail (there is quite a bit). For example, the fact that {} does not inhibit a ligature is deliberate and relates to how LuaTeX process the input and represents it in internal data structures. LuaTeX treats hyphenation as a property of language not of font. As such, LuaTeX can hyphenate words using different fonts if the language does not change. As a result, hyphenation is governed by per-language primitives. (LuaTeX can also hyphenate the first word of a paragraph, which Knuth's TeX does not do and which is nowadays a 'feature'.) LuaTeX also has the same issues as XeTeX in terms of extending primitives for Unicode working: see the demo above for example.

Worth noting is that as LuaTeX supports callbacks functionality unchanged by the engine may be altered by Lua code. An obvious example is the \font primitive. This is not extended by the engine, but is by a Lua-based font loader: plain and LaTeX users share the same code here while ConTeXt has its own (related) loader.

For both XeTeX and LuaTeX it is worth noting that the extension of math mode to allow a number of additional Unicode math parameters to be used (all prefixed \Umath...) means that math mode spacing may change if these additional data points are available, principally when using a Unicode math font.

The bottom line from all of this is that if you have an 8-bit document written for pdfTeX, e-TeX or indeed TeX90 you should be able to use pdfTeX to process it unchanged. XeTeX will give the same result with almost all files of the same form assuming they don't contain any engine tests or similar, and assuming that the contain no driver-specific code (XeTeX uses the xdvipdfmx driver in all cases, pdfTeX may use dvips, dvipdfmx or direct PDF output). LuaTeX may change the behaviour of such documents, including but not limited to hyphenation, line breaking, ligature formation and so on.


Looking at the question purely in terms of primitives, we have to decide if we are comparing XeTeX and LuaTeX with TeX90, e-TeX or pdfTeX1.40. The question seems to be focussed on 'current' engines, so I will take pdfTeX 1.40 as the 'reference (it incorporates the e-TeX modifications to TeX90 plus a range of additional primitives). As noted in the part above, some behaviours are changes in XeTeX and LuaTeX. I'll note where possible any TeX90/e-TeX/pdfTeX variations which seem important in this context. Quite a bit of this information is available in the LuaTeX manual.

As XeTeX and LuaTeX allow Unicode input, and primitives which are followed by the <number> of a character are affected by the change:

  • \char
  • \lccode
  • \uccode
  • \catcode
  • \sfcode
  • \efcode (LuaTeX-only: see below)
  • \lpcode
  • \rpcode
  • \chardef

These all accept the full Unicode range (up to 0x10FFFF) with the newer engines: pdfTeX like e-TeX and TeX90 allows only the 8-bit range (maximum 0xFF).

LuaTeX extends the range of registers allowed beyond that of e-TeX. Thus while pdfTeX and XeTeX allow up to 32767 box, count, dimen, muskip, marks and toks registers, LuaTeX allows a 16-bit range (max is 65535). This affects the primitives

  • \count
  • \dimen
  • \skip
  • \muskip
  • \marks
  • \toks
  • \countdef
  • \dimendef
  • \skipdef
  • \muskipdef
  • \toksdef
  • \box
  • \unhbox
  • \unvbox
  • \copy
  • \unhcopy
  • \unvcopy
  • \wd
  • \ht
  • \dp
  • \setbox
  • \vsplit

The \font primitive is extended by XeTeX to allow loading of system fonts with the syntax

\font⟨name⟩="⟨font identifier⟩⟨font options⟩:⟨font features⟩" ⟨TeX font options⟩

where the ⟨font identifier⟩ may be given in square brackets for a file name or without with a 'friendly' (system) name. This is not the case in LuaTeX: as noted above, LuaTeX is normally used with a Lua-based font loader which modifies the primitive via a callback.

LuaTeX allows file names to be given in braces as primitive sytnax, for example

\input{file name}

This affects the primitives

  • \font (note: this is purely to do with the file name of the font)
  • \input
  • \openin
  • \openout

pdfTeX adds a number of primitives to e-TeX, some related to PDF creation, some for microtypography and some general utilities. As XeTeX is based directly on e-TeX and not on pdfTeX, it only features some of these where they have been ported across. Some of the primitive are also renamed as they are no PDF-related. Thus XeTeX includes the following concepts introduced by pdfTeX:

  • \lpcode
  • \rpcode
  • \pdfpageheight
  • \pdfpagewidth
  • \pdfsavepos
  • \pdflastxpos
  • \pdflastypos
  • \ifincsname
  • \ifprimitive (\ifpdfprimitive in pdfTeX)
  • \primitive (\pdfprimitive in pdfTeX)
  • \strcmp (\pdfstrcmp in pdfTeX`)
  • \shellescape (\pdfshellescape in pdfTeX)
  • \normaldeviate (TL'19 onward, \pdfnormaldeviate in pdfTeX)
  • \uniformdeviate (TL'19 onward, \pdfuniformdeviate in pdfTeX)
  • \randomseed (TL'19 onward, \pdfrandomseed in pdfTeX)
  • \setrandomseed (TL'19 onward, \pdfsetrandomseed in pdfTeX)
  • \elapsedtime (TL'19 onward, \pdfelapsedtime in pdfTeX)
  • \resettimer (TL'19 onward, \pdfresettimer in pdfTeX)
  • \filedump (TL'19 onward, \pdffiledump in pdfTeX)
  • \filemoddate (TL'19 onward, \pdffilemoddate in pdfTeX)
  • \filesize (TL'19 onward, \pdffilesize in pdfTeX)
  • \mdfivesum (TL'19 onward, \pdfmdfivesum in pdfTeX)

but not for example \efcode (as noted above), \pdfliteral or many others.

LuaTeX is based on pdfTeX and retains some of the primitives introduced there, renames some to remove 'pdf' and drops others. As well as primitives marked as experimental or deprecated in pdfTeX 1.40, LuaTeX also removes the primitives:

  • \pdfelapsedtime
  • \pdfescapehex
  • \pdfescapename
  • \pdfescapestring
  • \pdffiledump
  • \pdffilemoddate
  • \pdffilesize
  • \pdflastmatch
  • \pdfmatch
  • \pdfmdfivesum
  • \pdfresettimer
  • \pdfshellescape
  • \pdfstrcmp
  • \pdfunescapehex

and provides

  • \primitive
  • \ifprimitive
  • \ifabsnum
  • \ifabsdim

without 'pdf' in the name. It also moves all of the 'back end' concepts (to do with producing PDF output) to three new primitives which implement the functionality of the various PDF-related \pdf... primitives from pdfTeX.

Currently, XeTeX and pdfTeX use the 'TeX--XeT' model for right-to-left typesetting while LuaTeX uses one derived from Omega/Aleph. As such, it does not feature the primitives

  • \TeXXeTstate
  • \beginR
  • \beginL
  • \endR
  • \endL

(Note that there has been suggestion that XeTeX may at some stage move from TeX--XeT to the Omega model.)

LuaTeX also alters the behaviour of \endlinechar and \newlinechar: the maximum value is 127 while setting any value below zero stores -1.

Both XeTeX and LuaTeX add new primitives to TeX and the behaviour of these of course requires the appropriate engine. Note in particular that new primitives for Unicode math handling (\Umath...) are available in both engines. The also both feature \suppressfontnotfounderror.