Inspired by the author's motivation for asking Is there a BNF grammar of tex language.
Are there any well done libraries that can parse some subset of TeX mathematics independently of the TeX engine? Important points to consider in answers:
-
How large of a subset of TeX mathematical notation is supported?
-
Is the parser portable? Does it have any dependencies?
-
Is the parser closely tied to a particular backend or could it easily be used to support multiple output formats. In other words, how easily could it be integrated into a new system that had to support output to PDF, HTML, PNG, etc.
For example, I know of the following parsers but not much about their applicability outside the use cases for which they were designed (Matplotlib graphics and math rendering in web browsers):
-
The mathematical expression handler in Python's Matplotlib.
-
The MathJax rendering library for JavaScript.
Best Answer
I've been looking into this too, so I'll share some observations that fall rather short of a proper answer, which would really involve looking at a whole lot of source code and asking the right questions about it.
Parsers generating HTML+Math ML
latex
with modified macros that insert specials into the DVI output, and parses the DVI output instead.Parsers generating XML
Jason Blevins has a list of tools that convert Latex documents to XML-based formats, and that handle equations reasonably. Romeo Anghelache's Hermes, which is part of a full Latex parser that generates XML with semantic markup, is worth singling out: like tex4ht, it works by running the Tex engine with macros to put specials in the DVI output, which it then parses; it supports a wider set of semantic markup.
Fragments of Latex or DVI
With the exception of the systems referencing Webtex, there doesn't seem to be much interest in clearly codifying subsets of Latex to be parsed, I guess because these are regarded as moving targets. Instead, lists of commands supported, like that I mentioned for Mathjax, seems to be the way things are done.
With DVI-based converters, the issue of parsing Latex goes away, replaced by the relatively trivial issue of parsing marked-up DVI and the trickier issue of identifying the semantically significant macros and constructing markup-issuing replacements that do not improperly interfere. I haven't looked at how this is done for equational layout. It would be a useful exercise to see how a converter from Tex formulae to those of It's worth noting that the representation of expressions is essentially a superset of that used by Heckmann & Wilhelm (1997) would work.
Syntax highlighting
A completely different kind of parsing is involved in syntax highlighting, where the idea is to help the author see the significance of the parts of the formulae. I don't know of any syntax highlighters that do an interesting job here: Auctex only raisers/lowers super&subscripts, but i haven't really looked.
Reference
Heckmann & Wilhelm, 1997, A Functional Description of TeX's Formula Layout.