[Tex/LaTex] TeX4ht (htlatex) on MiKTeX 2.9 to convert LaTeX into MathML, and then into Office MathML with Word 2010

htlatexhtmllatex-to-wordmiktextex4ht

I managed to use TeX4ht via the htlatex command on MiKTeX 2.9 to successfully convert a LaTeX document into an HTML document. The TeX file is a fairly simple test document with an integral in display format. However, I'm using this method as an indirect means to ultimately convert LaTeX into Word with Office MathML (OMML) math. The idea is to convert the TEX file into HTML, and then to simply open the HTML in Word 2010 and save it as DOCX.

Unfortunately, I can't seem to manage to do the entire conversion of LaTeX/TEX into MathML/HTML into Office MathML/DOCX properly. I do manage to get a nice HTML, but that's where I'm stuck.

Here's what I tried. Without options, htlatex generates a picture in PNG format rather than MathML embedded in the HTML file. Word keeps the picture and no OMML is created.

htlatex with the option "html,mathml" creates an HTML file with MathML, without a bitmap figure, but Word 2010 cannot read it properly, and the resulting formulas look crudely changed into text with greek letters and in-line symbols.

The option "xhtml,mathml" also gives problems, something along the lines of "DTD prohibited", when opening in Word the HTML file. Tried "xhtml,oofice,mathml" with similar luck.

I wonder if I should use mk4h mzlatex instead of htlatex. I haven't tested this alternative because mk4h on Windows seems to require a Perl interpreter, and it seems pretty much equivalent to htlatex with the option "xhtml,mozilla" anyway, as indicated in https://stuff.mit.edu/afs/athena/system/i386_deb50/os-ubuntu-9.04/usr/share/doc/tex4ht/html/. htlatex doesn't require Perl.

DETAILS

  1. Installed MiKTeX 2.9 with both miktex-tex4ht-bin-2.9 and miktex-tex4ht-base-2.9 installed with the package manager in administrator mode, on Windows 7.

  2. Created a test.tex file in E:\downloads, as follows:

    \documentclass{report}
    \begin{document}
    Hello there. This is a test of $x_i^2=3$, where
    $$\int_0^\infty f(x) = 1.$$
    \end{document}
    
  3. Converted the TeX file into an HTML file with the following MS DOS commands. It is important to note that as htlatex accepts filenames but not pathnames, it is important to set the current directory to where the tex file resides. If a pathname were given for the test file, e.g., e:\downloads\test.tex, the error "undefined command sequence" would be produced.

    e:
    cd Downloads
    "C:\Program Files (x86)\MiKTeX 2.9\scripts\tex4ht\htlatex.bat" test.tex
    
  4. The conversion is successful, but the integral formula appears as a bitmap file, instead of being embedded as MathML.

  5. Tried the following instead, which does generate the Math ML code instead of the image, but Word 2010 cannot recognize it properly and convert it into Office Math ML:

    "C:\Program Files (x86)\MiKTeX 2.9\scripts\tex4ht\htlatex" test.tex "html,mathml"
    

Best Answer

I had success at opening your document with Word 2010. First of all, I converted document to odt format. Because you don't have mk4ht command available, you need to use following htlatex call:

htlatex test.tex "xhtml,ooffice" "ooffice/! -cmozhtf" "-cooxtpipes -coo"

all these parameters are necessary, they are used in postprocessing of the xml file with document.

When I tried to open resulting document with Word 2010, only crossed gray boxes were shown in the place of math. Then I tried to open and then save test.odt with OpenOffice (3.2) and LibreOffice (3.6). In both cases, saved file was much almost twice as big. Document saved in OpenOffice was opened without problems in Word, document saved in LibreOffice reported incorrect content, but then Word asked if I want to repair it and then opened it correctly as well.

I took a look at odt files saved from Open/LibreOfficess and it seems that math was extracted from the main document and it each math fragment was saved in standalone file with some additional metadata. While Open/LibreOffice is fine with odt files produced by tex4ht, word obviously isn't and its odt support needs to have math in particular format. I don't know whether this behaviour shows also current Word version and it is probably a bug in odt support in Word, but I don't know odt standard well enough to judge that. If your Word doesn't support math in odt files produced by tex4ht try to open and save it with OpenOffice.

enter image description here

Related Question