[Tex/LaTex] Are there any non-WEB re-implementation of TeX Core in recent years

tex-coretex-history

I am looking for re-implementations, in recent years, of the TeX core algorithm (which Knuth wrote in ca. 1982).

It would seem to me that LaTeX, XeTeX, LuaTeX all call the TeX core, which is written in the quaint Pascal-WEB.
They convert the WEB program to Pascal or C by automated tools, right?
There does not seem to be fresh rewriting of TeX core in more modern languages.

Related question: More Modern Reimplementations of TeX, but that is 8 years ago.
If, in this decade, there is any project of new radical rewriting, I would be very willing to study the code.

  1. As I understand it, LuaTeX is a partial rewriting of TeX core in CWEB, right?

  2. Besides LuaTeX, is there any completely (or almost completely) re-implementation of the whole TeX core algorithm, which does not use WEB?
    For example, there is a guy who reportedly implemented TeX core algorithm in Clojure, but I cannot find any trace of it on GitHub.
    Additionally, there is a paper, which says the authors implemented TeX core algorithm in SML, but that is very long ago, and the source does not seem to have been made public.

Since I am citing new material, this is not a duplicate, I think.
Of course, do correct me if any statement is wrong, or any link is mistaken.

Best Answer

I'm trying to collect a list of TeX implementations here. The following is what I know.

  • Firstly, LuaTeX is a complete and full-featured extension of TeX, widely used today and in active development (until recently at least). It does not have any WEB code: the code was manually translated (not an automatic translation into ugly code, the way it happens in the internals of the pdfTeX/XeTeX build process) into C a long time ago. See the article LuaTEX says goodbye to Pascal (MAPS 39, EuroTeX 2009) by Taco Hoekwater for informative details. As a running example, I'll use the scan_left_brace example from the article: see the corresponding code in the LuaTeX sources which is currently here. (In some places I've seen mentions of CWEB but I don't know why; this is just C code with comments.)

  • If Pascal is your only problem, there is Martin Ruckert's web2w project (webpage, TUGboat article), which is indeed a translation into CWEB (using an automated tool, but one that cares about the readability of the generated output). Working off this translated code, Prof. Ruckert has managed to further develop an ebook reader application (HINT) including a second version that he demonstrated at TUG 2019. The corresponding section in the typeset CWEB listing looks like this:

    web2w ctex

    generated from the .w source where it looks like:

    @* Basic scanning subroutines.
    Let's turn now to some procedures that \TeX\ calls upon frequently to digest
    certain kinds of patterns in the input. Most of these are quite simple;
    some are quite elaborate. Almost all of the routines call |get_x_token|,
    which can cause them to be invoked recursively.
    @^stomach@>
    @^recursion@>
    
    @ The |scan_left_brace| routine is called when a left brace is supposed to be
    the next non-blank token. (The term ``left brace'' means, more precisely,
    a character whose catcode is |left_brace|.) \TeX\ allows \.{\\relax} to
    appear before the |left_brace|.
    
    @p void scan_left_brace(void) /*reads a mandatory |left_brace|*/ 
    {@+@<Get the next non-blank non-relax non-call token@>;
    if (cur_cmd!=left_brace) 
      {@+print_err("Missing { inserted");
    @.Missing \{ inserted@>
      help4("A left brace was mandatory here, so I've put one in.")@/
        ("You might want to delete and/or insert some corrections")@/
        ("so that I will find a matching right brace soon.")@/
        ("(If you're confused by all this, try typing `I}' now.)");
      back_error();cur_tok=left_brace_token+'{';cur_cmd=left_brace;
      cur_chr='{';incr(align_state);
      } 
    } 
    
  • If both Pascal and web/cweb-style literate programming are problematic, there is rsTeX (my favourite for reading, maybe because I had a small part to play in its becoming available online). This is a manual translation of the WEB code into (minimal) C++ (see comments at the top of the file), and the corresponding scan_left_brace is here.

  • If you want to get more esoteric, there are a bunch of obsolete/incomplete implementations in various states of completeness and compatibility with present-day compilers. Many of them seem to derive from the manual translation that Pat Monardo made in the late 1980s/early 1990s, which was called CommonTeX. See the corresponding code section in the different repositories here (CommonTeX), here (VorTeX) (see Pehong Chen's thesis), here (cpptex), here (tex++), and a few more that don't seem to be (NTS and ExTeX are in Java, etc). See the respective descriptions on GitHub. There are also some partial re-implementations listed.

  • As Barbara Beeton mentioned in a comment, there is Doug McKenna's "JSBox" (where the "JS" doesn't stand for "JavaScript" but "Johann Sebastian"), said to be a full rewrite of TeX and to pass the trip test. At TUG 2019 he demonstrated an impressive ebook / app which consists of TeX being used to typeset all the text around certain interactive figures. The source code is not (yet?) available. There was no video recording at TUG 2019, but two videos from TUG 2014 are available here and here, along with a TUGboat article.

  • If you are aware of any others, please let me know.