My aim is to produce an html-file with the same text as the pdf produced by LaTeX- The html shall represent the pagination and line-break structure of the pdf: When there is a linebreak in the pdf I want to produce a <br>
in html, when there is a paragraph I want to produce a <p>
in html, when there is a newpage in the pdf I want to produce a horizontal line in html.
Handling of the paragraphs is easy since they are defined in the input file. But line-breaking and pagination depends on the font and on the width and height of the document (and maybe on some other things I cannot even imagine yet).
Is there a way of getting LaTex to tell me where it broke the lines and where it started a new page?
Best Answer
This latex:
Produces a log file showing the position of all the output:
So with a bit of perl (which might need to be made smarter in a real example) You can re-constitute the text adding the requested line and paragraph markup:
then
perl zz.pl zz.log > zz.html
produces:which looks like