[Tex/LaTex] Can LaTeX be persuaded to produce text output

conversionoutput

I've read here and there that ConTeXt can produce XML output. We also have, from time to time, questions about converting LaTeX to different formats. On the basis that "The only parser for TeX is tex", if latex could produce text output instead of PDF then it would be possible to write a style file to convert reasonable input to a different markup language.

Would this be possible?

Bit of background: I encounter this "can we convert from LaTeX?" question in the context of the nLab where the input format is Markdown+iTeX (iTeX not being anything to do with Knuth's proposal but a subset-of-LaTeX-to-MathML converter) but people often have snippets of LaTeX articles that they want to include. So converting all the way to XHTML+MathML via, say, tex4ht isn't the right option. I wrote a Perl script that reimplements much of TeX to do this, but after doing so realised that my style files would work in ordinary LaTeX and produce the "right" output, except that they would be embedded in a PDF. So if I could just persuade TeX to produce text, I'd be almost there. Of course, I could try to extract the text from the PDF but that "feels wrong" and I'd worry about extra stuff sneaking in by accident.

Best Answer

The underlying solution is of course the same for ConTeXt and LaTeX: you need to have a way of changing what macros do such that they write the correct output rather than typesetting. This is also much the same as tex4ht does. The advantage ConTeXt has is that the macros are provided mainly by one focussed group, and they include the necessary 'back end' to make that conversion easy. To do the same for LaTeX, you need to handle all of the macros that might be present, which is a problem given the number and variety of LaTeX packages. So while in principal it's possible, the implementation is a severe challenge.

(With my 'LaTeX3 hat' on, this is an obvious area to bear in mind when defining an updated format. To do that, you need to have a much more 'regular' syntax and input than is often the case with LaTeX files at present. Again, I think ConTeXt shows how this can be done as it is already good on keeping the input within it's own structures.)