[Tex/LaTex] Convert LaTex accented characters to UTF8 equivalent with Python

accentspythonunicode

I think this should be something standard, but I'm having hard finding anything that works.

Given a plain string containing LaTeX accented characters,
does anyone know of a python library which can convert the accents to UTF8 ?

I understand that one may put together a large dictionary by hand (a translation table/rule) and simply apply the translation rule on the string. But this is rather a tedious task, and before doing so, I'd much appreciate if someone can point out if there already is a working solution or not.

Best Answer

Well, it does not feel particularly great to answer my own question, but I though to share the resolution here, as it might be potentially useful to others. A python script, which does the transformation is now available at this Github repository. It's self-contained, is (hopefully) easy to use and can be modified to add new features. The current version is a prototype, and is intended to be used in SciLag (possibly after some modifications), a community project for open problems in mathematics. So, please fell free to share your comments.

Related Question