[Tex/LaTex] Mapping from Unicode character to LaTeX-Symbol for BibTeX

bibtexunicode

I'm writing a little BibTeX exporter for the publication database of my institute. We do have a lot of authors with all kind of weird characters in their names, which get the "WTF is Unicode?"-treatment from BibTeX.

As I have to preprocess author names and titles before exporting anyway, I thought that I could replace as much unicode characters as possible with their LaTeX equivalent. There's an image with such a mapping on bibtex.org:
mapping

But that image is

  1. incomplete (e.g. capital German umlauts are missing) and
  2. not of much use to me in this form.

Does someone know of such a mapping that is as complete as possible and available in a machine-readable format?

Edit: Juan's XML is probably as complete as it gets (I'll post a Python dictionary reduced to unicode and LaTeX on github). But in the meantime, I also found the mapping that Zotero uses. It can be found in their SVN-Repository.

Edit2: OK, the Python dictionary can be found here, and the XSL Style Sheet to convert Juan's XML into a Python dictionary is here.

Best Answer

From a related question on SO, there is

... an XML file from the W3C. It maps Unicode to HTML, MathML, LaTeX, Mathematica, and others. (The file is 1.4 MB, uncompressed.)

You can read more about it here: http://www.w3.org/TR/unicode-xml/

Related Question