What your best option would be depends on a lot on what your needs are. Are you only trying to import the structure, or exact look, or what? How important is it that the resulting document really be done properly?
Anyway, here are a number of things to try.
AbiWord: an open source word processor that can import HTML or similar formats and export LaTeX. (Be sure to install the extra export plugins when installing; the default install doesn't include a LaTeX export, but it can easily be chosen.)
Writer2LaTeX: An openoffice plugin for exporting to LaTeX; Open office supports HTML import of course (Though W2L can handle .odt to .tex even without Open Office installed; but then converting .html to .odt might be trickier.)
rtf2latex2e: as its name implies, converts RTF to LaTeX; so you'd need some way to convert HTML to RTF (though that's relatively easy, can be done with most any word processor).
pandoc: Haskell program for converting between various mark-up languages, including HTML and LaTeX
html2latex: Perl script for such conversions (I've never tried it but plan on doing so soon)
htmltolatex Java program along similar lines (Again, I haven't tried it.)
Even with all those options, however, personally, if it was something I truly cared about doing right, simply transferring over the plain text and redoing everything manually would still be my solution of choice. The above are just quick fixes for a document of relatively little importance, or when having it in LaTeX in addition to HTML is just a matter of convenience.
Using PDF as an intermediate format when converting from LaTeX to HTML is not a very good idea. LaTeX and HTML are both mostly structural markup languages, which means you use them to describe the document structure (sections, emphasize, formulas etc.), whereas PDF is mostly about the representation of your document on the screen or paper. When converting LaTeX to PDF, you lose much of the structural information, and it cannot be successfully recovered by conversion from PDF to HTML.
It is much better to convert LaTeX directly to HTML. There are number of ways (WayBack Archive) how to do that, one I would recommend is by using htlatex
. It is probably already part of your TeX distribution, is very powerful and flexible, and its use can be as simple as running
htlatex mydocument.tex
If you tell us more about your environment (which operating system do you use, what is your TeX distribution, your text editor/LaTeX IDE, how you generated the PDF file etc.) we may be able to give you more details on how to use htlatex
.
Best Answer
pdf2htmlEX can convert PDF to HTML without losing format.