[Tex/LaTex] the 2014 status on converting Word and MathType to LaTeX

conversionimport

Why have I started a new question?

I have studied as many pages here as I can on answers to this question and none have really satisfactory answers. I want to start a new discussion on this problem.

Why is it that important?

If we really believe that TeX/LaTeX is the prince of typesetting and want to convince others of this, then there has to be some kind of pathway to conversion if you already have a large body of work in another system.

Summary of my particular problem

I have a partly written physics text book comprising a collection of very large Word documents with hundreds of graphics and hundreds of MathType equations. Having been converted to TeX/LaTeX I just can't go back to working in Word, in fact my study/work laptop is a microsoft free experiment which is also in the process of becoming Adobe free, which is a trickier prospect. I really need to find a conversion solution for those documents.

The Question

I suppose I should point out that Mac solutions would be preferable but I do have ready access to a Windows machine. Keeping in mind that I have researched this fairly extensively on this site so far, does anyone know of any up to date solutions? I believe in theory it must be possible to at least convert a word document with styles, graphics and MathType equations into a reasonable .tex file that then might still need significant refinement but not massive, fundamental rewriting.

Word's styles must have some kind of specification that could be translated at least partially to LaTeX styles. Various graphics converters exist. MathType has a converter for its equations to LaTeX. These three components combined could surely produce at least a decent starting point for rewriting a large document.

Why this matters to me personally

My text is already 260 A4 pages. With what I've learned as a LaTex user about rules of typesetting associated with research into readability and so on, even with a perfect translation they would have to be reorganised into about 400 pages. This is because there is far too much on each page; far too many words per line, too many complications in the layout of equations and diagrams.

A solution that at least converts headings, paragraph styles, equations and graphics, leaving me to restructure the pages and fine tune would be brilliant.

I'm well aware of the irony of my situation. Why should I expect that someone in a similar situation to me has created a solution to save me the trouble.

Conclusion

Not a final conclusion. I will keep adding to this as the story progresses. So far a solution combining docx2tex and GraphicConverter gets me the diagrams, writer2latex gets me the headings and body text. If I get MathParser working then I need to find a utility that converts MathType equations from Word to MathML. That would get me a significant way towards a worthwhile conversion.

So the problem remains how to batch extract MathType equations from a word document. I can do them one at a time with MathType. The bizarre thing is why Design Science appears to have done such a bad job on the LaTeX export. Their MathML export seems pretty good so if I find a working converter for MathML to LaTeX, the one at a time thing wouldn't be too bad.

However, the commercial solution of Word2TeX, proves that it can be done.

Best Answer

The answer has gradually been accumulating in the question. The question title was edited from "latest" to "2013", but perhaps that person didn't notice that the discussion was still active in January this year. I thought it was time to move the points accumulated so far into an answer. Then as new points become available I'll add them here. I will commit to keeping this thread up to date. Next year I'll change it to 2015 and so on, adding anything new and removing anything redundant.

What have I tried already?

  1. docx2tex: This is remarkable in some ways and sorely lacking in others. I was not a low end user of word, I used styles to structure my documents in a consistent manner. It gets all the text out and separates headings but with zero formatting. It gets all the graphics out, but I had to do a batch convert to pdf using GraphicConverter before I could use them. All my MathType equations were converted to graphics. Various other problems that I won't go into yet.

    On the plus side, if I do end up having to start from scratch, at least it produces a good starting point, with none the less a lot of work remaining to do.

  2. MathType: Has a feature for converting equations to LaTeX, but it's very clunky. It uses a very limited set of maths environments. There are masses of layers of unnecessary brackets. And you're doing them one at a time.

  3. MathParser: MathType produces MathML output as well. I thought that if I converted MathType to MathML first, then the result might convert more nicely using MathParser, but I've downloaded the Java applet and all I get is blank output.

  4. writer2latex: A suggestion from drat. I downloaded OpenOffice and installed the writer2latex extension. When it imports the Word file it imports the MathType equations as graphics. It's good at exporting the heading and body styles but bad at exporting the graphics.

  5. word2tex: A suggestion from Harish Kumar. I downloaded the 30 day trial and I have to say, in comparison to what I've looked at so far, this is stunning. If anyone wants to try it, go to Chikrii Softlab. Download the 30 day trial. It only does 1 table, 1 image and 7 equations but does all headings and body text. Put together a tex file from a sub set of what you want to translate that maximises this to put it to the test. It will count a complex equation with several lines in it as 1 equation.

    At some point, if I can't eventually find a suitable solution, I will consider buying this one. It's not cheap. At $45 for an individual academic license it begs the question if you are an academic who later uses it to sell a text book, do you technically owe them the other $44 dollars?