[Tex/LaTex] Maintained tool to faithfully convert LaTeX/PDF to HTML

html

There is a tool called pdf2htmlex that does an impressive job of converting PDF to HTML. Very faithful. Either books full of mathematics and diagrams have been faithfully converted.

Unfortunately, it is no longer maintained. Is anyone aware of another tool that is maintained that does as good a job of converting either PDF or LaTeX into HTML?

Best Answer

I'd recommend make4ht which, from the documentation:

make4ht is a simple build system for tex4ht , TeX to XML converter. It provides a command line tool that drive the conversion process. It also provides a library which can be used to create customized conversion tools

The author of make4ht is michal-h21 and is a very active contributor to this site.

Let's use the following small example, mwe.tex, in what follows:

\documentclass{article}

\begin{document}

Here is some text. And here is some mathematical content $y=x^2$.
\end{document}

example 1

Running

 make4ht.exe mwe.tex

gives the output:

<!DOCTYPE html> 
<html lang="en-US" xml:lang="en-US" > 
<head><title></title> 
<meta  charset="iso-8859-1" /> 
<meta name="generator" content="TeX4ht (http://www.tug.org/tex4ht/)" /> 
<meta name="viewport" content="width=device-width,initial-scale=1" /> 
<link rel="stylesheet" type="text/css" href="mwe.css" /> 
<meta name="src" content="mwe.tex" /> 
</head><body 
>
<!--l. 5--><p class="noindent" >Here is some text. And here is some mathematical content <span 
class="cmmi-10">y </span>= <span 
class="cmmi-10">x</span><sup><span 
class="cmr-7">2</span></sup>. </p> 
</body> 
</html>

example 2

From here we can customise the output by employing configuration files; if you have the following:

roxy.cfg

\Preamble{mathml,-css,NoFonts}
\Configure{@HEAD}{\HCode{<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js?config=TeX-AMS-MML_HTMLorMML">\Hnewline</script>\Hnewline}}
\begin{document}
\EndPreamble

and run

make4ht.exe -f html5 -c roxy.cfg mwe.tex

then you receive

<!DOCTYPE html> 
<html lang="en-US" xml:lang="en-US" > 
<head> <title></title> 
<meta  charset="iso-8859-1" /> 
<meta name="generator" content="TeX4ht (http://www.tug.org/tex4ht/)" /> 
<meta name="viewport" content="width=device-width,initial-scale=1" /> 
<link rel="stylesheet" type="text/css" href="\aa:CssFile " /> 
<meta name="src" content="mwe.tex"> 
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js?config=TeX-AMS-MML_HTMLorMML"> 
</script> 
</head><body 
>
<!--l. 5--><p class="noindent" >Here is some text. And here is some mathematical content
<!--l. 5--><math 
 xmlns="http://www.w3.org/1998/Math/MathML"  
display="inline" ><mi 
>y</mi> <mo 
class="MathClass-rel">=</mo> <msup><mrow 
><mi 
>x</mi></mrow><mrow 
><mn>2</mn></mrow></msup 
></math>.

</body> 
</html>

example 3

If you have html tidy installed, then you can customise your build process to employ it by using the following roxy.mk4 file:

roxy.mk4

Make:match("html$", "tidy -m -config html-tidy.txt -i ${filename}")

and an html-tidy.txt configuration file

// sample config file for HTML tidy
indent: auto
indent-spaces: 2
quiet: yes
output-xhtml: no
output-html: yes

then you can run

make4ht.exe -f html5 -e roxy.mk4 -c roxy.cfg mwe.tex

to receive

<!DOCTYPE html>
<html lang="en-US">
<head>
  <meta name="generator" content=
  "HTML Tidy for HTML5 for Windows version 5.6.0">
  <title></title>
  <meta charset="utf-8">
  <meta name="generator" content=
  "TeX4ht (http://www.tug.org/tex4ht/)">
  <meta name="viewport" content=
  "width=device-width,initial-scale=1">
  <link rel="stylesheet" type="text/css" href="\aa:CssFile">
  <meta name="src" content="mwe.tex">
  <script src=
  "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
</head>
<body>
  <!--l. 5-->
  <p class="noindent">Here is some text. And here is some
  mathematical content <!--l. 5--><math xmlns=
  "http://www.w3.org/1998/Math/MathML" display="inline">
  <mi>
    y
  </mi>
  <mo class="MathClass-rel">
    =
  </mo>
  <msup>
    <mrow>
      <mi>
        x
      </mi>
    </mrow>
    <mrow>
      <mn>
        2
      </mn>
    </mrow>
  </msup></math>.</p>
</body>
</html>