[Tex/LaTex] is there a python module for parsing LaTeX


I am looking to write python programs that modify LaTeX source files. To do this I would like to have a basic parser in python that can reliably read and write LaTeX files while maintaining the tree. I'm okay if it is not a full implementation, but I need it to handle LaTeX's odd quoting rules and {} notation. Regular expressions simply do not work for this, due to the fact that braces can be recursive.


The main thing I want to handle is recursive braces, which is why I need a parser, rather than a simple lexical analyzer. That is, I want to be able to register \foo{} as a command I care about and catch:

\foo{this is the foo argument}

But I also want to be able to catch:

\foo{this is \emph{really} the foo argument}

Is there any such python module out there?

Best Answer

Please see if the LatexWalker class of pylatexenc can help:

from pylatexenc.latexwalker import LatexWalker

w = LatexWalker(r"\foo{this is \emph{really} the foo argument}")
(nodelist, pos, len_) = w.get_latex_nodes(pos=0)


>>> foo
{this is \emph{really} the foo argument}