[Tex/LaTex] unicode-math and tex4ht with utf-8 input

input-encodingstex4htunicodeunicode-mathxetex

I have a document which I'm compiling to PDF using LuaTeX, and I want to create an HTML version from the same sources. The text is german with umlauts, and for convenience I'm using some unicode-math (also hyperref doesn't like $\phi^2$ in the TOC but has no issue with $φ²$). Minimal example, compiles with lualatex file

\documentclass{article}
\usepackage[ngerman]{babel}
\usepackage{unicode-math}
\begin{document}
Löse $\sqrt{φ} = 1$
\end{document}

Now to compile it using htlatex file "xhtml,mathml,charset=utf-8" I add inputenc so it doesn't ignore the ö, but unicode-math isn't compatible with pdfTeX so I get an error for the φ, which can be ignored and is then missing from the output.

\documentclass{article}
\usepackage[ngerman]{babel}
\makeatletter
\@ifpackageloaded{tex4ht}{
    \usepackage[utf8]{inputenc}
}{
    \usepackage{unicode-math}
}
\makeatother
\begin{document}
Löse $\sqrt{φ} = 1$
\end{document}

So I use htxelatex file "xhtml,mathml,charset=utf-8", without inputenc because XeTeX already expects UTF-8 input and does indeed complain when inputenc is used. However, now ö and φ are silently ignored in the output?! Compiling this code using xelatex file does give the expected result though, so it's a tex4ht problem.

And, I guess as a consequence of that, unicode-math doesn't work either, giving lots of errors which when ignored lead to no output at all.

\documentclass{article}
\makeatletter
\usepackage[ngerman]{babel}
\@ifpackageloaded{tex4ht}{
    %\usepackage{unicode-math}
}{
    \usepackage{unicode-math}
}
\makeatother
\begin{document}
Löse $\sqrt{φ} = 1$
\end{document}

The unicode-specific options mentioned in the manual, i.e. -cunihtf, don't have any effect either… How do I make this work?

Or is that what the manual means by “partial support” for XeTeX (why does it work for pdfTeX then?!). The only reason I'm using XeTeX here is because there doesn't seem any LuaTeX-support for tex4ht, but could I maybe use LuaTeX to create the DVI for tex4ht anyway?


Things get weirder.

\documentclass{article}
\usepackage{newunicodechar}
\newunicodechar{ö}{\"o}
\newunicodechar{φ}{\phi}
\begin{document}
Löse $\sqrt{φ} = 1$
\end{document}

Compiles with htxelatex testhtml "xhtml,mathml,charset=utf-8" and inserts ϕ for φ, however the ö is inserted as a latin-1 encoded character. I have no idea where that comes from.

htxelatex testhtml "xhtml,mathml,charset=latin-1" only changes the charset in the header, the file is then displayed correctly.

Adding " -cunicode -utf8" as the third option causes the φ to be inserted as UTF-8, but doesn't change the latin-1 ö.

Best Answer

The problem is with tex4ht, application which converts dvi file to html, doesn't support opentype fonts and compilation fails when one is used. Because it seems there is nobody who understand tex4ht c source and would be able to fix this bug, the only solution is to hack unicode-math to not use opentype fonts with tex4ht.

I hacked fontspec in similar way and it worked with texlive 2012, I am not sure about texlive 2013, since there was upgrade of many involved packages - package code and info page. I tried also to hack unicode-math, but I failed with that.

You can add some support with \DeclareUnicodeCharacter, but it would fail in some cases, as you noted.

Edit For your edited question, I can correctly compile your example, if I edit it little bit:

\documentclass{article}
\ifdefined\HCode
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{newunicodechar}
\newunicodechar{ö}{\"o}
\newunicodechar{φ}{\phi}
\else
\usepackage{unicode-math}
\setmathfont{Asana-Math.otf}
\fi
\begin{document}
Löse $\sqrt{φ} = 1$
\end{document}

and compile with

htxelatex filename "xhtml, mathml, charset=utf-8" " -cunihtf -utf8"