[Tex/LaTex] How many lines of code does the original TeX contain

linetex-coreweb

I am currently implementing a non-trivial software component (not related to typesetting), and would like to compare it to other components both from the same field and from other fields, including TeX. I have been trying to find a source line count for the original TeX (not LaTeX), but so far have found no information.

I understand that TeX has been implemented in the style of literate programming using WEB, so the source code and the code-level documentation are actually in the same file.

Apparently TeX is implemented too in the style of structured programming. I found that one structured program written by Knuth is 50,000 lines of code (source: http://www.literateprogramming.com/lpquotes.html) but that is probably something else than TeX.

I also found the line count of Metafont, which appears to be 23,000 lines of code (source: http://walden-family.com/ieee/dtp-tex-part-1.pdf)

So, how many lines of code is the standard TeX system? References pointing to the original WEB source code are appreciated too; it's not hard to type wc -l.

Related: How many man-years does it take to implement TeX? …although that is about man-years and not source lines.

Best Answer

  • The tex.web file contains 24985 lines at the moment. This is in the literate-programming style, so has a lot of what you might call comments.

  • When passed through tangle (e.g. tangle tex.web), you get Pascal code, but this is comparable to the code produced by a JavaScript minifier: statements are all jammed together on the line and there are line breaks at essentially ad-hoc places. (There are about 6115 lines in this file but that does not really mean anything.)

  • If you run this tex.p through a Pascal pretty-printer, that may be a better indication of the size of the program: I ran ptop -l 10000 tex.p tex.pretty.p and the resulting file was 20619 lines long. Of course this depends on the prettifier etc.

  • Instead if you run tex.web through weave, you get a typeset listing of the program include the “comments”; this is the way the program is intended to be read, and the form in which it has been published as a book. At standard settings, the resulting PDF has 1379 modules followed by the index, printed over about 500 pages.

  • This tex.web is in some sense not the “full” program but a putative “common denominator” program, intended to be easily portable to any Pascal compiler that was around at the time (1982). When ported to one such computer installation (OS, file conventions, etc) it would be accompanied by a (hopefully small) “change-file”. (In fact such a changefile was used even at the original TeX installation at Stanford, where the local editor had text files containing pages, the keyboard included quite a few non-ASCII characters, etc.) So you may want to take such a changefile into account too.

Finally, the image below (via here), in its middle region (i.e. if you ignore the four big boxes on the outside) shows the rough relative sizes (in terms of lines of code / number of modules) of the different parts of the program:

Knuth drawing


Edit: The above was just a quick answer I posted in the morning before going out, but in case there's some further interest:

  • Knuth's 1989 paper The Errors of TeX (DOI 10.1002/spe.4380190702), an updated version of which is reprinted in his collection Literate Programming, contains a definitive and detailed account of TeX's development. His own count, from the paper:

    Now TeX82 is in its third and final phase.* It has grown from the original 4600 statements in SAIL to 1376 modules in WEB, representing about 14,000 statements in Pascal.

    (*A footnote added in 1991, and thus only in the book, mentions the (7-bit→8-bit) “major changes in 1989 that can be said to have inaugurated “Phase 4” of TeX82”.)

  • For more than you want to know: I have started compiling a (very incomplete right now) web page about the program and my attempts to read it. :-)

  • In particular, to get a sense of the size of the program, you may want to look at Richard Sandberg's manual translation of tex.web into C++, which keeps only the code and almost no comments: .cpp (17426 lines (16045 sloc)), .h (3068 lines (2675 sloc))

  • Almost all usage of TeX today is via huge macro packages like LaTeX running on extended TeX programs, so (even not counting the size of the macro packages, more lines than TeX itself!) you may want to consider eTeX, pdfTeX, and XeTeX which are roughly 15%, 50–60%, and (relying heavily on system libraries) 35% bigger than Knuth TeX, respectively.

Related Question