[Tex/LaTex] How to convert HTML to LaTeX

conversionhtml

I would like a way to convert a document from HTML to LaTeX on a Windows platform. What is my best option?

Best Answer

What your best option would be depends on a lot on what your needs are. Are you only trying to import the structure, or exact look, or what? How important is it that the resulting document really be done properly?

Anyway, here are a number of things to try.

AbiWord: an open source word processor that can import HTML or similar formats and export LaTeX. (Be sure to install the extra export plugins when installing; the default install doesn't include a LaTeX export, but it can easily be chosen.)

Writer2LaTeX: An openoffice plugin for exporting to LaTeX; Open office supports HTML import of course (Though W2L can handle .odt to .tex even without Open Office installed; but then converting .html to .odt might be trickier.)

rtf2latex2e: as its name implies, converts RTF to LaTeX; so you'd need some way to convert HTML to RTF (though that's relatively easy, can be done with most any word processor).

pandoc: Haskell program for converting between various mark-up languages, including HTML and LaTeX

html2latex: Perl script for such conversions (I've never tried it but plan on doing so soon)

htmltolatex Java program along similar lines (Again, I haven't tried it.)

Even with all those options, however, personally, if it was something I truly cared about doing right, simply transferring over the plain text and redoing everything manually would still be my solution of choice. The above are just quick fixes for a document of relatively little importance, or when having it in LaTeX in addition to HTML is just a matter of convenience.