[Tex/LaTex] Recommended workflow? LaTeX -> ePub with math viewable on iPad

ipadmathmlpandocsvgtex4ht

This started as a comment in an older thread, but I thought I repost it here as a question to the forum…

I'm looking for a workflow that will take me efficiently and reliably from existing LaTeX documents (e.g., books) with lots of (sometimes) complex embedded math to high-quality ePubs that will display nicely in iBooks under iOS6 and, ideally, on other readers as well (lower priority).

I've been looking at LateXML, TeX4ht, plasTeX, XhtmLaTeX, pandoc, ebook-convert, and the tbook DTD (which starts with XML rather than LaTeX and thus looks promising primarily for new documents). All of these tools seem to have a lot to offer, but none of them, as far as I can tell, gets me to the finish line without what appears to be significant manual intervention (implying a need for significant learning/debugging as well). I haven't been able to decide yet which pathway is worth investing the time and energy on. Looking forward to an up-to-date assessment.

I'll note that I already published an ePub textbook with equations rendered as SVG, and it displayed very nicely in iBooks under iOS5. Unfortunately, iOS6 broke it, and the eBook is now a virtual paperweight. I'm trying to recover and get the same book back into a usable ePub format that renders equations nicely.

I don't mind investing in a commercial product if it's the least painful means to solve the problem, but I'd prefer to build a script-based workflow based on open-source tools for Mac OS X.

Best Answer

The basic problem you're running into is the immaturity of the epub format.

It's very easy for a publisher to produce an epub 3 of a novel if they're already set up to produce epub 2, and the epub 3 version will typically work fine on readers designed for epub 2.

However, there seems to have been very slow progress to date on getting publishers and device manufacturers going on the new and fancy epub 3 features such as mathml. It's probably an economic issue. Publishers can't wave a magic wand over their catalogs and produce mathml for all the equations that appear in all their books; it would be an expensive case-by-case slog for them. Their profit cows are K-12 and college textbooks, and most of those were designed in a large format that is not suitable for handheld devices. Since the publishers have little economic motivation to start selling epub 3 with mathml, the hardware manufactures have little economic motivation to start supporting mathml in their devices. Apple seems to have partial, lousy support on some of their devices. Meanwhile, Amazon shows no interest at all in making math work on their format. I wish I could hold out some hope that this would get fixed sooner rather than later, but, frankly, the experience with mathml in the browser doesn't encourage such a hope. For example, Wikipedia still doesn't do mathml after all these years. Because of all these factors, there basically is not much progress yet in getting good open-source tools for producing epub 3+mathml.

Since epub 3+mathml isn't likely to become good and mature in the near future, it's worth considering holding off completely on putting a lot of work into converting a book into the format.

Having said that, I do have some experience experimenting with doing this. Basically epub is xhtml, so if you can get xhtml+mathml output from your latex, you're not that far from having a working epub 3+mathml book. There are already lots and lots of tools for converting latex to html. (You listed them in your question.)

There is an open-source program called calibre that will convert any valid XHTML 1.1 + CSS 2.1 document to valid epub 2. What I did was to generate xhtml output, translate it to epub 2 using calibre, and then patch the epub 2 to try to make it valid epub 3. (Calibre is not capable of outputting epub 3+mathml according to the spec, and unfortunately the developer seems to have zero enthusiasm for making it do so: http://www.mobileread.com/forums/showpost.php?p=1904668&postcount=7 . ) Both my book and the scripts I wrote for patching are open source, so anyone who wants to tinker with them is welcome to: https://github.com/bcrowell/calculus . From a brief look at Andrew Stacey's page, it looks like the approach he's used is fairly similar.

The best epub 3 output I was able to produce is here: http://www.lightandmatter.com/calc/ . I don't own an iAnything, but I got one of my students to show it to me on his device, and basically it seemed to have worked to the extent that Apple had implemented mathml correctly on the device. (Their implementation at that time, about a year ago, was pretty awful, though. E.g., integrals signs appeared as boxes.)

Please don't even think about trying to use mathjax. AFAIK most readers don't support javascript in epub 3 at all. Frankly, I wouldn't want the feature activated on an ebook reader I owned. (Think ads, animations, annoying idiosyncratic user interfaces like the unskippable stuff at the beginning of a DVD.) Let's keep in mind that mathjax is a beautifully executed kludge, whose sole purpose is to cover up for Microsoft's failure to impement mathml in IE. Even on a desktop computer, its performance can be bad, and on a handheld device it would probably be atrocious. The epub 3 standard provides a standard way to do mathml, so that's the right way to do mathml on those devices.

Testing is a problem. I use the open-source java program epubcheck to check whether my epub output is valid. However, just because epubcheck says it's valid, that doesn't mean it will render correctly on handheld devices. There will probably be a period of a decade or more during which some people's devices can handle epub 3+mathml and other people's won't. Calibre 0.8.66+ can display epub+mathml properly, but it uses mathjax, which is completely different from the implementation of mathml on handheld readers. Calibre does not currently output epub 3 at all, which is why I wrote the scripts to patch its output to make it valid epub 3.

Related Question