[Tex/LaTex] Migrating from pdfTeX to LuaTeX: Problems with reproducing output for legacy projects

I'm currently analyzing and evaluating a migration process from pdfTeX to LuaTeX for our current TeX workflow. With LuaTeX being a fork of pdfTeX and packages like luainputenc at hand it seemed promising to reproduce pdfTeX's output using LuaTeX when sacrificing (some or most of) LuaTeX's new features.

Currently, however, I'm stuck and need your help to decide whether it's worth digging deeper or accepting what I found out. There are two problems I'm facing.

Here's the first problem. Engines: pdfTeX 3.1415926-2.4-1.40.13 and LuaTeX beta-0.70.2-2012052410 (both from TeX Live 2012) with --output-format=pdf. When using UTF-8 input ([utf8x]{inputenc} for pdfTeX, [utf8x]{luainputenc} for LuaTeX) and T1 encoded fonts ([T1]{fontenc}) the output from both engines differs for some non-T1 characters.

MWE:

\documentclass{article}
\usepackage[utf8x]{luainputenc}
\usepackage[T1]{fontenc}
\renewcommand{\rmdefault}{lmr}
\begin{document}
\begin{tabular}{@{}l*{10}{p{7mm}@{}}}
Some T1 characters:     & \# & \$ & \% & Ă & Ň & § & @ & Æ & ß   & £   \\[1.5mm]
Some non-T1 characters: & ‡  & ÿ  & ‰  & … & ¶ & ½ & ĩ & µ & | | & | | \\
\end{tabular}
\end{document}

Notes: luainputenc is forwarding to inputenc if called from pdfTeX. The output is the same when using an \ifluatex and loading inputenc and luainputenc separetely. The spaces between the bars in the last two columns of the second row are supposed to be a Unicode NO-BREAK SPACE (U+00A0) and a THIN SPACE (U+2009).

pdfTeX output: (everything fine here)

LuaTeX output: (notice the last five columns of the second row)

This problem does most certainly exist for other characters as well, these are just some I encountered.

Is there a way to get the pdfTeX output from LuaTeX? The problem is the same with different T1 encoded fonts, try mathpazo if you want. Using a different font encoding is not an option here as the task at hand is to migrate TeX engines, not fonts or their encodings. Am I perhaps missing a pdfTeX package that has to be replaced for LuaTeX usage?

The lutf8x (notice the leading L) luainputenc package option does not help here either. (Which is no suprise as its purpose is different.) What I'm trying to use here, according to its manual, is the "UTF-8 legacy mode" of luainputenc, i.e. mimicing the behavior of inputenc in pdfTeX by making non-ASCII characters active to determine the correct bit length of characters and so on.

Maybe the problem is not the input side (afaik luainputenc's job of translating input bytes into LICR) but the output side (afaik translating LICR into glyph positions of the font used)? Maybe LuaTeX is just translating to Unicode positions, not regarding glyph positions of the font, like the EU2 encoding?

Anyway, where does the problem come from? Can it be helped and if so, how?

Btw: It works fine with LuaTeX when using an OpenType-Font with fontspec and EU2 encoding, but that's not the primary goal here.

And here's the second problem. It's closely tied to the first one and can be reproduced using the MWE above. When using --output-format=dvi pdfTeX is producing a DVI file that dvips has no problem with. When using a LuaTeX DVI file dvips stops with something like

This is dvips(k) 5.992 Copyright 2012 Radical Eye Software (www.radicaleye.com)
' LuaTeX output 2012.08.01:1700' -> ohnexml-luatex.ps
dvips: ! invalid char 297 from font ec-lmr10

This is how I actually came up with the suspicion that for the first problem Unicode positions and not font encoding specific glyph positions are written to the output file as 297 is the decimal Unicode point of ĩ (second row, fourth last column in the MWE above).

If a solution to the first problem does not solve this problem transitively, how can this one be helped?

Thank you for your thoughts.

Best Answer

At first you should imho better use utf8 instead of utf8x. utf8x is unmaintained and has problems e.g. with biblatex. (You will have to set up the some missing definitions for pdflatex). You will also have to add some definitions for lualatex as it will map - as you already found out - undeclared chars simply to their unicode position. Here e.g. two definitions for ½ & µ:

\documentclass{article}
\usepackage[utf8]{luainputenc}
\usepackage[T1]{fontenc}
\usepackage{textcomp}
\renewcommand{\rmdefault}{lmr}
\DeclareUnicodeCharacter{00BD}{\textonehalf}
\DeclareUnicodeCharacter{00B5}{\textmu}
\begin{document}
\begin{tabular}{@{}l*{10}{p{7mm}@{}}}
Some T1 characters:     & \# & \$ & \% & Ă & Ň & § & @ & Æ & ß   & £   \\[1.5mm]
Some non-T1 characters: & ‡  & ÿ  & ‰  & … & ¶ & ½ & µ %ĩ &  & | | & | | \\    \end{tabular}
\end{document}

Best Answer

Related Solutions

[Tex/LaTex] unicode-math and tex4ht with utf-8 input

[Tex/LaTex] \DeclareUnicodeCharacter and LuaTeX

Related Question