[Tex/LaTex] why “ff” displays strange using unicode encoding vs. iso-8859-1 in HTML output from tex4ht

input-encodingsisoligaturestex4htunicode

I understand that one should (?) be using UNICODE for html. But using this with tex4ht, always makes "ff" look strange. Why does this happen for "ff"? and based on this test below, should one instead use iso-8859-1 encoding when using htlatex to generate HTML from Latex, or may be I am not using the correct combination of options?

Given this latex file

\documentclass{article}%
\begin{document}
\title{Kamke differential equations}
\maketitle
\end{document}

Here are 5 different htlatex commands: 3 used utf8, one does not use any encoding (default), and finally one uses iso-8859-1. The iso option generated the best result. This is on windows using firefox.

enter image description here

references

http://www.tug.org/applications/tex4ht/mn-commands.html

using texlive 2012 debian to compile latex files.


Update

Result using provided test below is shown in this image. It shows a problem.

I've put the zip file which contains all output and the HTML file in this folder

http://12000.org/tmp/061613/

I am using firefox 20, windows 7

enter image description here

Best Answer

As to the “why” question, what is happening here is that under some circumstances the letter pair “ff” is replaced by “ff” LATIN SMALL LIGATURE FF U+FB00, which is understandable in a sense, but then things go wrong.

In cases 3 and 4, the character is presumably written correctly in the HTML document, but the font being used does not contain it, so the browser picks up a glyph from a backup font. The probably depends on font settings, which were not disclosed in the question.

In case 2, the character is written as UTF-8 encoded, bytes 0xEF 0xAC 0x80, but then these bytes get interpreted according to windows-1252, yielding “ff”. The reason is that the character encoding has been declared incorrectly, or maybe not at all, forcing browsers to guess, and they may guess wrong.

Using U+FB00 is understandable but questionable. Such characters used to be the only way to use ligatures in HTML document, but they only work when the font used contains them. Nowadays you can use font-feature-settings in CSS, and although they are still relatively poorly supported (in fonts and in browsers), they are safe in the sense that when they fail, e.g. “ff” gets displayed just as “ff”, not in a fancy incorrect way.