[Tex/LaTex] Fixing mathit spacing with unicode-math

lhs2texmath-modeopentypeunicode-mathxetex

I am using lhs2TeX to convert code into LaTeX. It typesets code into math mode. As I am also writing math formulas in the same document, I use unicode-math to switch between different math mode fonts in order to make the two things more distinct (and also to use Unicode symbols in my math).

lhs2TeX sets variable identifiers using a macro \Varid, which is usually set to \mathit.

The problem here is that with unicode-math, \mathit does not work correctly. I am unsure whether this is a problem with unicode-math or the OpenType math fonts, but the bottom line is that it does not typeset its argument as text but like a formula, with additional spacing between the letters. I also found a bug report on this.

I could redefine \Varid with \textit, but this leads to another problem: There are a lot of subscripts in my identifiers, and as the underscore doesn't do subscripts in text mode, this doesn't work. At least not without me overriding every identifier with custom TeX to fix the subscripts.

Here is an example demonstrating the problem.

\documentclass{minimal}

\usepackage{fontspec}
\setmainfont{TeX Gyre Pagella}

\usepackage{unicode-math}
\setmathfont[version=lm]{Latin Modern Math}
\setmathfont[version=pg]{TeX Gyre Pagella Math}


\begin{document}

\mathversion{lm}
% math in latin modern
    $x ∼ y$

\mathversion{pg}
% haskell code in tex gyre pagella

    \textit{factorial}

    $\mathit{factorial}$ \quad This should look like the former, not the latter.

    $factorial$

\mathversion{lm}
% latin modern again
    $x ∼ y$

\end{document}

Screenshot of the Output
(source: goedderz.info)

So the question is: Can mathit be fixed in this setting? If not, can I write a macro that works in math mode, but chooses the correct font? Or can I write a macro which locally redefines the underscore to do subscripts in text mode?

Edit: A main problem here, that unfortunately isn't part of my example, is that the TeX code inside \Varid (which can be redefined) is usually automatically generated and often contains _ for subscripts (or else must be rewritten manually for every identifier which I try to avoid). So using \text instead is problematic.

Edit: Actually, that is not quite true, as something like x1 gets translated to \Varid{x}_1, which works when defining \Varid as something like

\newfontfamily\haskvarfont{TeX Gyre Pagella}
\renewcommand{\Varid}[1]{\text{\haskvarfont\emph{#1}}}

Best Answer

This is the consequence of a bad decision in unicode-math: they named \mathit the normal math italic letters, instead of respecting the LaTeX convention of referring to them as \mathnormal; \mathit should choose the text italic font, in order to make unicode-math a drop-in replacement.

I wouldn't expect different output when unicode-math is loaded or not; but this simple example shows the bug:

\documentclass{article}
%\usepackage{unicode-math}

\begin{document}

$\mathit{different}$

$different$

\end{document}

If the line with unicode-math is commented out, we get

enter image description here

If I uncomment the line, I get

enter image description here

which is definitely wrong.

Workaround:

Define a new math alphabet:

\documentclass{article}
\usepackage{unicode-math}

\DeclareMathAlphabet{\Lmathit}{\encodingdefault}{\familydefault}{m}{it}

\begin{document}

$\Lmathit{different}$

$different$

\end{document}

If you use lhs2TeX, you can add

\renewcommand{\Conid}[1]{\Lmathit{#1}}
\renewcommand{\Varid}[1]{\Lmathit{#1}}

after loading it.

This shouldn't raise the Too many math alphabets error; if it does, then add the code you find between \makeatletter and \makeatother in https://tex.stackexchange.com/a/100428/4427