[Tex/LaTex] Has the time come for a LaTeX3 successor to ‘listings’

l3regexlatex3listingsmintedpackage-writing

I'm currently looking for a nice and challenging open-source project to work/collaborate on after my thesis. I don't have a Comp. Sci. background, but I'd like to learn more about compiler construction and I also would like to improve my TeX skills, so I figured: Two birds, one stone.

I was thinking of a LaTeX package that would allow for more advanced syntax highlighting of listings than is currently offered by the listings package. I use the latter a lot and find it mostly pretty good, but I find it also very frustrating for some more advanced stuff. Implementing proper syntax highlighting of a language with a context-sensitive grammar is really a pain in the neck, in listings. Also, Unicode isn't supported "out of the box".

enter image description here

Sure, we've got minted, verbments, pythontex. Those packages successfully unleashed the power of Python and Pygments in the world of TeX; however, customisation of the syntax highlighting (colour, etc.) involves writing/customising a Pygments lexer; in other words, it requires some Python coding outside the .tex file. Wouldn't it be nice if everything could be done without using -shell-escape? Or is there no point in trying to replicate Pygments in TeX?

enter image description here

Arguably, more powerful syntax highlighting seems more within reach, with a combination of fancyvrb and powerful LaTeX3 packages such as l3regex.

  1. Am I just fooling myself? Is there even a point in such a project? Or should we be content to use existing tools (listings, minted, etc.)?
  2. Without turning this into a biglist, what, if anything, do you find frustrating about listings? What would be on your wishlist for a hypothetical new package meant for typesetting source code?
  3. Would anybody interested in collaborating on such a project please stand up? Anybody? Hello…?
  4. What limitations of l3regex should I know about before deciding to use it for such a package?

Please do chime in below…

Best Answer

Limitations of l3regex

On page 12 of the documentation, under "The following features of PCRE or Perl will definitely not be implemented":

  • Recursion: this is a non-regular feature.
  • Back-references: non-regular feature, this requires backtracking, which is prohibitively slow.

So it doesn't look like you will be able to track matched/unmatched parentheses using l3regex alone (at least to an arbitrary depth), and some other things will be complicated without additional tools.

Ultimately, the question may be, "Are the extra things we can easily do with l3regex worth the effort, given that we won't be able to match Pygments (at least not without a lot of work...and even if we could, it might be too slow)?"

State of syntax highlighting

As the author of pythontex and the new maintainer of minted, here are my thoughts on the state of syntax highlighting.

  • The main disadvantage of tools that use Python, in my view, has been performance. They either require two compiles (pythontex), or are slow for one compile (minted). But I've added caching to the development version of minted, so I think that is solved.
  • A second disadvantage of minted is the potential security issues of using \write18. Maybe someone can figure out a way to have things like Pygments added to a whitelist of sorts. (pythontex doesn't use \write18 due to the way it uses two compiles with a Python script run in between, so it could be secure for highlighting. But I haven't tried to make it secure, because it's made for executing Python code, and syntax highlighting is just there for convenience.)
  • I don't see any reason that a TeX interface for customizing a Pygments lexer couldn't be created, particularly if the power of LuaTeX is invoked. I expect that settings could be collected on the TeX side, and then automatically plugged into a Python template that customizes the Pygments lexer. (That being said, I don't have any plans to attempt this, at least in the near future, due to time constraints.)
  • The thing I would really like to see for syntax highlighting is an updated, easily customized version of fancyvrb. It would be a nice basis for building future syntax highlighting packages. In particular, the following features would be useful.
    • Built-in support for Unicode (patch VerbatimOut to use \detokenize, etc.)
    • Support for automatic breaking of long lines.
    • Error checking for font-related issues (tildes can be raised like a superscript due to font issues, backticks can require upquote, etc.)
    • Built-in support for creating an environment with a custom name and then automatically numbering the lines for such environments consecutively, so that, for example, all Python code uses one numbering while all C code uses another.
    • Default setup that plays nicely with framing packages like framed, mdframed, tcolorbox, etc. Or at least a setup that is made to work with one of them really well.
    • Built-in macros for creating styles.