[Tex/LaTex] Why does inputenc abandon so quickly under “utf8 based engines”

font-encodingsinput-encodingsxetex

Why do I need to do the extra work starting with \ifdefined in order to get my French guillemets correct in the pdf output, when using xelatex with a source specifying the use of T1-encoded fonts ?

\documentclass[french]{article}

    \usepackage[T1]{fontenc}
    \usepackage[utf8]{inputenc}

\ifdefined\XeTeXinterchartoks
     \catcode`« \active
     \catcode`» \active
     \def«{\char19 }
     \def»{\char20 }% ça marche, même avec Babel+frenchb
\fi

\usepackage{newtxtext}

\usepackage{babel}
\frenchbsetup{og=«, fg=»}

\begin{document}

\showboxbreadth\maxdimen
\showboxdepth\maxdimen
\showoutput

«coucou»
\end{document}

French guillemets

The log contains:

Package: inputenc 2015/03/17 v1.2c Input encoding file
\inpenc@prehook=\toks14
\inpenc@posthook=\toks15


Package inputenc Warning: inputenc package ignored with utf8 based engines.

But it is loaded after fontenc. It is not forbidden to use fontenc with xelatex. inputenc is loaded after it. Thus it should know that T1-encoded font slots are to be used. Why then doesn't it do the job of making these characters active and map them to the suitable \char xx slots ?

There is something escaping me here…

Notice that the code sample also uses babel+frenchb which adds automatic spacing. It seems not to have been perturbed from my making the characters active.

In order to explain more the issue, consider the following input:

\documentclass{article}

    \usepackage[T1]{fontenc}
    \usepackage[utf8]{inputenc}

\begin{document}

\showboxbreadth\maxdimen
\showboxdepth\maxdimen
\showoutput

«coucou»

\end{document}

It produces, if compiled with xelatex:

bad guillemets

The explanation is simple: the ascii chars « and » are in slots 171 and 187 respectively. Hence the corresponding glyphs from the T1 encoding are used, giving the result. inputenc does nothing, but it could have donc something akin to my code above.

...\hbox(6.63332+0.0)x345.0, glue set 290.00977fil
....\hbox(0.0+0.0)x15.0
....\T1/cmr/m/n/10 «
....\T1/cmr/m/n/10 c
....\T1/cmr/m/n/10 o
....\T1/cmr/m/n/10 u
....\T1/cmr/m/n/10 c
....\T1/cmr/m/n/10 o
....\T1/cmr/m/n/10 u
....\T1/cmr/m/n/10 »

Best Answer

inputenc is abandoned because it does absolutely nothing with XeTeX or LuaLaTeX. Better said, it would do bad!

See fontenc vs inputenc

Essentially, the task performed by inputenc is translating input characters into their LICR form. With an 8 bit engine, « is two byte long and inputenc is able to translate them into \guillemotleft and » into \guillemotright. But for doing so it must make some characters active. Which is exactly what you do later on, and inputenc is not instructed to do, because it's thought for an 8 bit engine.

I added a friendlier interface with newunicodechar.

\documentclass[french]{article}

\usepackage[T1]{fontenc}
\usepackage{newunicodechar}

\newunicodechar{«}{\guillemotleft}
\newunicodechar{»}{\guillemotright}

\usepackage{newtxtext}

\usepackage{babel}
\frenchbsetup{og=«, fg=»}

\begin{document}

«coucou»

\end{document}

enter image description here

If your aim is to provide translations for the characters in t1enc.dfu, then you can use it in a different way.

\documentclass[french]{article}

\usepackage[T1]{fontenc}
\usepackage{newunicodechar}

\newcommand\DeclareUnicodeCharacter[2]{%
  \expandafter\newunicodechar\Uchar"#1{#2}%
}
\input{t1enc.dfu}

\usepackage{newtxtext}

\usepackage{babel}
\frenchbsetup{og=«, fg=»}

\begin{document}

«coucou»

\end{document}

A proof of concept for a package xeinputenc

\ProvidesPackage{xeinputenc}[2015/12/12]
\RequirePackage{newunicodechar}

\AtBeginDocument{\xeinputenc@process}

\newcommand{\xeinputenc@process}{%
  \begingroup
  \gdef\xeinputenc@list{}%
  \def\cdp@elt##1##2##3##4{%
    \g@addto@macro\xeinputenc@list{\lowercase{\xeinputenc@input{##1}}}%
  }%
  \cdp@list
  \aftergroup\xeinputenc@list
  \endgroup
}

\newcommand{\DeclareUnicodeCharacter}[2]{%
  \expandafter\newunicodechar\Uchar"#1{#2}%
}

\newcommand{\xeinputenc@input}[1]{%
  \InputIfFileExists
    {#1enc.dfu}
    {\wlog{... processing UTF-8 mapping file for font encoding #1}\catcode`\ 9\relax}%
    {\wlog{... no UTF-8 mapping file for font encoding #1}}%
}


\@onlypreamble\DeclareUnicodeCharacter
\@onlypreamble\xeinputenc@list
\@onlypreamble\xeinputenc@process
\@onlypreamble\xeinputenc@input
\endinput

Now your test document can be

\documentclass[french]{article}

\usepackage{xeinputenc}

\usepackage{newtxtext}

\usepackage{babel}
\frenchbsetup{og=«, fg=»}

\begin{document}

«coucou»

\end{document}

No explicit loading of fontenc is needed in this case, because this is already taken care of by newtxtext, but calls to it will be honored.