[Tex/LaTex] What’s new in TeX, version 3.141592653

knuthtex-core

A new version of TeX has been released by Donald Knuth earlier this year (January/February 2021), and its version number is 3.141592653 now. What has changed since the previous release, version 3.14159265, from seven years ago (January 2014)?

For those who don't know, Knuth writes on his website that:

I still take full responsibility for the master sources of TeX, METAFONT, and Computer Modern. Therefore I periodically take a few days off from my current projects and look at all of the accumulated bug reports. This happened most recently in 1992, 1993, 1995, 1998, 2002, 2007, 2013, and 2020; following this pattern, I intend to check on purported bugs again at the end of years 2028, 2037, 2047, etc. The intervals between such maintenance periods are increasing, because the systems have been converging to an error-free state.

and, accordingly, that:

This is the year when I promised to do the seven-year cleanup of TeX and METAFONT (last updated at the beginning of 2014). It took me three weeks to wade through several hundred issues of many different kinds, and five actual bugs were discovered(!). But I'm happy to report that the newly-tuned-up systems don't “break” any of the things that used to work, and the bugs weren't likely to bite. Everyone can therefore upgrade or not, at their convenience.

So what are the "five actual bugs"? Under what (rare) circumstances do they matter?


(Some notes on this question itself:

  • With the previous two previous updates Knuth has published a summary of the changes in TUGboat, namely The TeX tuneup of 2014 and The TeX tuneup of 2008. I expect there will similarly be an article "The TeX tuneup of 2021", but last time, the question and answer were posted on this website even before that article was published…)

Best Answer

The total changes made by Knuth for the 2021 tuneup were huge (there were several people scrutinizing his work this time around). There were several small changes from typo corrections in older errata to rewording of the copying conditions in tex.web and even some changes to the plain format itself. Here are the entries added to errorlog.tex and a small1 explanation to each. (The “five bugs” Knuth mentions are I948, S949, S950, B952, and R953. There are other two that were not added to errorlog.tex; they are at the bottom of this answer.) A shorter version of this summary is also now available at the TUG website.

* 15 January 2021
I948. Don't pause on errors when tracing paragraphs (Udo Wermuth). @826
S949. Don't try to interact when in |\batchmode| (Xiaosa Zhang). @83
S950. Don't try to edit when no file is active (Xiaosa Zhang). @84
R951. Take date and time sometimes from system, not user (Udo Wermuth). @241,536
B952. Don't allow implicit left brace after |#| (Udo Wermuth). @476
R953. After nine parameters, must delete offending tokens (Bruno Le Floch). @476
D954. Garbage visible in buffer after file ends prematurely (DRF). @486
R955. Force nonexistent characters to have null specs (DRF). @722
C956. Don't mark fraction noads as temporarily Inner (DRF). @761
Q957. Reset |\newlinechar| before logging the stats (Udo Wermuth). @1333,1335

[1]: I wrote the “small explanation” in the intro before actually writing the explanations, so I didn't lie... technically :)


I948. Don't pause on errors when tracing paragraphs (Udo Wermuth). @826

This bug would cause TeX to apparently hang when \tracingparagraphs was on (> 0), and the Infinite glue shrinkage error occurred while the paragraph trace info was being printed. When \tracingparagraphs, is on, TeX is writing to log_only (unless \tracingonline=1), and in case of that error it would prompt the user for interaction, but since the write selector was redirecting the output to the the .log, the user would not see that, and TeX would be apparently stuck.

For example, if you run on a file that contains this line:

\tracingparagraphs=1 Press\hss return.\end

the terminal will show

This is TeX, Version 3.14159265 (TeX Live 2020) (preloaded format=tex)
 restricted \write18 enabled.
entering extended mode
(./test.tex

and hang there, waiting for user interaction (for example, typing x then <RETURN> to end the run, or just <RETURN> to ignore the offending \hss). This will not work if you run

$ tex '\tracingparagraphs=1 Press\hss return.\end'

because in that case there is no .log open yet, so the output will be to the terminal.

After the tuneup, TeX will treat this error as it treats any other error: if in \errorstopmode it ask the user for interaction, otherwise it will scroll past it:

This is TeX, Version 3.141592653 (TeX Live 2021/dev) (preloaded format=tex)
(./test.tex
! Infinite glue shrinkage found in a paragraph.
<inserted text> \par 
                     
<to be read again> 
                   \end 
l.1 \tracingparagraphs=1 Press\hss return.\end
                                              
? 

The relevant change entry is:

429. Don't echo error message to terminal when tracing paragraphs
(Udo Wermuth, 15 January 2017)
@x module 826
  begin no_shrink_error_yet:=false;
@y
  begin no_shrink_error_yet:=false;
  @!stat if tracing_paragraphs>0 then end_diagnostic(true);@+tats@;
@z
@x
  error;
@y
  error;
  @!stat if tracing_paragraphs>0 then begin_diagnostic;@+tats@;
@z

S949. Don't try to interact when in |\batchmode| (Xiaosa Zhang). @83

This bug, initially reported here, was causing TeX to ask for user interaction (TeX's ? prompt) while in \batchmode, thus trying to write to a closed \write stream, and this would cause a segmentation fault. From Karl's answer to the original report, you can reproduce that error in TeX up to 2020 by running tex -ini then typing these lines, ending them with <RETURN>:

\catcode`\^=7 \catcode`\^^?=15 \s^^?E
1
q
v

With the new TeX, after typing q to enter \batchmode, TeX won't try to ask for user interaction if it can't, so that won't break anymore.

The relevant change entry is:

430. Defeat interactions during batch mode (Xiaosa Zhang, 27 June 2020)
@x module 83
@ @<Get user's advice...@>=
loop@+begin continue: clear_for_error_prompt; prompt_input("? ");
@y
@ @<Get user's advice...@>=
loop@+begin continue: if interaction<>error_stop_mode then return;
  clear_for_error_prompt; prompt_input("? ");
@z

S950. Don't try to edit when no file is active (Xiaosa Zhang). @84

This bug, initially reported here was triggered when you tried to open the editor (using TeX's E option) when an error happened in an input given interactively. Suppose you have a file called h.tex with a single line (supposing that \ERROR is undefined, or is anything that would cause an error):

% h.tex
\ERROR

then, when TeX complains about ! Undefined control sequence \ERROR you reply:

I\MISTAKE V

which will insert \MISTAKE V for TeX to process, and it will once again complain, due to the undefined \MISTAKE, and now you reply:

E

and TeX will segfault.

Here's the transcript of the interactive session:

$ tex h
This is TeX, Version 3.14159265 (TeX Live 2020) (preloaded format=tex)
(./h.tex
! Undefined control sequence.
l.1 \ERROR
          
? I\MISTAKE V
! Undefined control sequence.
<insert>   \MISTAKE
                    V
l.1 \ERROR
          
? E
No pages of output.
Transcript written on h.log.
Segmentation fault (core dumped)

This error would happen because TeX would try to tell you the name of the input file in which the error occurred, but since the error was on an interactively input command, there is no associated file. After the tuneup TeX knows that in that case it is not reading from a file, so it won't try to give you a file name.

The relevant change entry is:

431. Don't exit to editor if no input file is at the bottom line
(Xiaosa Zhang, 03 July 2020)
@x module 84
"E": if base_ptr>0 then
@y
"E": if base_ptr>0 then if input_stack[base_ptr].name_field>=256 then
@z
@x module 85
if base_ptr>0 then print("E to edit your file,");
@y
if base_ptr>0 then if input_stack[base_ptr].name_field>=256 then
  print("E to edit your file,");
@z

R951. Take date and time sometimes from system, not user (Udo Wermuth). @241,536

Before setting \jobname (more precisely before starting the .log file) you could change the value of \year, \month, \day and \time, and that would be written in the header line of the .log. If you did

$ tex '\day=99 \end'

the first line of the .log would say something like

This is TeX, Version 3.14159265 (TeX Live 2020) (INITEX)  99 FEB 2021 22:18

(note the Feb 99th :) or, if you were feeling really devious, you could print any three bytes from TeX's executable by setting a bogus value of \month, like (with the build I have here)

$ tex -ini "\month=-54 \end"

to get a month called TeX:

This is TeX, Version 3.14159265 (TeX Live 2020) (INITEX)  2 TeX 2021 22:20

or with an extreme enough value you could make TeX crash with a segmentation fault, for example:

$ tex '\month=-100000 \end'

With the new version, the value printed in the header is an internal sys_(time|day|month|year), that can't be changed by changing the primitive registers.

This is a rather long (in number of lines) change, and not so interesting as reading material here (basically declare new variables sys_<thing>, initialise the primitives to those, and use sys_<thing> instead of <thing> to print the banner), so I will omit the change entry, but you can find by searching for its header

432. Keep date and time in system variables, use them in opening banner
(Udo Wermuth, 11 December 2020)

in tex82.bug.


B952. Don't allow implicit left brace after |#| (Udo Wermuth). @476

This bug (which I was surprised it wasn't found before) allowed you, when the last token of the <parameter text> of a definition was #6, to use an implicit begin-group character (like \bgroup) in place of the explicit begin-group character that marks the end of the <parameter text> of a definition, such that

\def\foo#1#\bgroup(#1)}
\show\foo

was valid, and would show

> \foo=macro:
#1\bgroup ->(#1)\bgroup .

on the terminal, meaning that the parameter #1 of \foo was delimited by \bgroup, and that \bgroup would be reinserted after the <replacement text> of the macro, exactly how TeX does with an explicit begin-group character. After the tuneup, you will get an error from the definition above:

! Parameters must be numbered consecutively.
<to be read again> 
                   \bgroup 
l.1 \def\foo#1#\bgroup
                      (#1)}
?

and further errors that will followed due to the malformed definition (the macro that will be defined with the input above will be, after some errors, \foo=macro:#1#2\bgroup (#31)->.).

The relevant change entry is:

434. Don't accept an implicit left brace after # in macro head
(Udo Wermuth, 20 May 2020)
@x module 476
if cur_cmd=left_brace then
@y
if cur_tok<left_brace_limit then
@z

R953. After nine parameters, must delete offending tokens (Bruno Le Floch). @476

With this bug you could have TeX do some real funny things. When scanning the <parameter text> of a macro, after the nine allowed parameters, any # will raise an error, but the token following that # would be left in the <parameter text>. Suppose you had a macro with 9 parameters, and tried to add a tenth parameter #0:

\def\foo#1#2#3#4#5#6#7#8#9#0{}
\show\foo

TeX would complain to you that

! You already have nine parameters.
l.1 \def\foo#1#2#3#4#5#6#7#8#9#0
                                {}
? h
I'm going to ignore the # sign you just used.

? 

and the macro definition would have the # ignored, but the 0 would remain there:

> \foo=macro:
#1#2#3#4#5#6#7#8#90->.
l.2 \show\foo
             
? 

So far, nothing exciting. But now suppose a day you were feeling extra naughty and used ## instead of #0, like \def\foo#1#2#3#4#5#6#7#8#9##{}, then \show\foo would say

> \foo=macro:
#1#2#3#4#5#6#7#8#9##->.
l.2 \show\foo
             
? 

and would you look at that! #9 is now delimited by a parameter token, so if you called \foo 12345678hello#, #9 would be hello!

Even worse, you could trick TeX's scanner into grabbing a } as the argument of a macro without errors (after the two You already have nine parameters errors, of course). This example from the original bug report shows that:

\def\foo#1#2#3#4#5#6#7#8#9#}##{\show#9}
\show\foo
\foo12345678} }#
\end %        ^^ delimiter

In the example you have a macro delimited by }# (the tokens left after TeX removed the two extra #), and as part of the scanning, the first } would not go through the Argument of \foo has an extra } error, so it would be added to the current parameter, then the \show#9 will say:

> end-group character }.
<argument> }
             
\foo #1#2#3#4#5#6#7#8#9}##->\show #9
                                    
l.39 \foo12345678} }#
                     
? 

After the 2021 tuneup, TeX now understands that you meant for #0 to be a parameter, so naturally the 0 should be removed as well, so now the error message says a little more:

! You already have nine parameters.
l.1 \def\foo#1#2#3#4#5#6#7#8#9(#0
                                 ){}
? h
I'm going to ignore the # sign you just used,
as well as the token that followed it.

?

and the definition will contain no trace of your tenth parameter:

> \foo=macro:
#1#2#3#4#5#6#7#8#9()->.
l.3 \show\foo
             
?

The relevant change entry is:

433. After nine parameters, delete both # and the token that follows
(Bruno Le Floch, 22 October 2020)
@x module 473
label found,done,done1,done2;
@y
label found,continue,done,done1,done2;
@z
@x module 474
begin loop begin get_token; {set |cur_cmd|, |cur_chr|, |cur_tok|}
@y
begin loop begin continue: get_token; {set |cur_cmd|, |cur_chr|, |cur_tok|}
@z
@x module 476
  help1("I'm going to ignore the # sign you just used."); error;
@y
  help2("I'm going to ignore the # sign you just used,")@/
    ("as well as the token that followed it."); error; goto continue;
@z

D954. Garbage visible in buffer after file ends prematurely (DRF). @486

With this bug, the error message File ended within \read could be followed by garbage context, if the circumstances were right. Before the tuneup, if you were \reading from a file with one { too many, you could see the error message. Suppose a file unbal.tex with the single line:

{

and then you run the following document:

\catcode`{=1 \catcode`}=2 \catcode`#=6
\openin1 unbal
\def\A#1#2#3#4#5#6#7#8#9{\read1to \x}
\def\B#1#2#3#4#5#6#7#8#9{\A#1#2#3#4#5#6#7#8#9 \relax}
\def\C#1#2#3#4#5#6#7#8#9{\B#1#2#3#4#5#6#7#8#9 \relax}
\def\D#1#2#3#4#5#6#7#8#9{\C#1#2#3#4#5#6#7#8#9 \relax}
\def\E#1#2#3#4#5#6#7#8#9{\D#1#2#3#4#5#6#7#8#9 \relax}
\E123456789 \end

the error message would start with

Runaway definition?
->{ 
! File ended within \read.
<read 1> {^^M7#8#9{\D

where ^^M7#8#9{\D are the leftovers in the buffer variable.

After the tuneup, TeX now cleans up the buffer and the error context is correct:

Runaway definition?
->{ 
! File ended within \read.
<read 1> 

The relevant change entry is:

435. Keep garbage out of the buffer if a |\read| end unexpectedly
(DRF, 17 February 2018)
@x module 486
    align_state:=1000000; error;
@y
    align_state:=1000000; limit:=0; error;
@z

R955. Force nonexistent characters to have null specs (DRF). @722

This one didn't have a visible effect on normal usage of TeX, so no compilable example for this one (mostly because I failed to produce a bad .tfm file to trigger this bug).

In a .tfm file, a non-existent character is marked by its width index being zero, and TeX assumes that if that is true, all other metrics of said character are zero as well, but nothing was enforced.

If that weren't the case, though, when reading a character from a font, TeX would only look at its width, and assume everything else is zero, without enforcing. But if a .tfm was made so that the width was zero, but for example the italic correction were not, that index would not be zeroed and the wrong italic correction would be used.

After the tuneup, if the width of a character is zero, TeX will nullify the entire character to make sure. The relevant change entry is:

436. Zero out nonexistent chars, to prevent rogue TFM files
(DRF, 06 October 2020)
@x module 722
    math_type(a):=empty;
@y
    math_type(a):=empty; cur_i:=null_character;
@z

C956. Don't mark fraction noads as temporarily Inner (DRF). @761

This bug had a fix in tex.web but it was more of a bug in The TeXbook. In short, some places in The TeXbook, for example the last paragraph on page 155 used to say:

There’s also an eighth classification, \mathinner, which is not normally used for individual symbols; fractions and \left...\right constructions are treated as “inner” subformulas [...]

but now fractions were removed from that statement. The relevant change entry in tex.web is:

437. Don't classify fraction noads as inner noads (DRF, 25 March 2019)
@x module 761
fraction_noad: begin t:=inner_noad; s:=fraction_noad_size;
  end;
@y
fraction_noad: s:=fraction_noad_size;
@z

which now doesn't make a fraction an Inner atom any longer. Though that won't have any change in math typesetting because a fraction was usually written as {1\over2}, and the extra braces to enclose the subformula would make that fraction an Ord atom for all purposes.

The only way to get the fraction as an actual Inner atom was if either the formula was only a fraction, like $1\over2$, in which case it wouldn't make a difference, because of the math boundaries, or if the fraction was enclosed in a \left...\right pair, but then it would become an Inner atom anyway because of \left...\right. All other uses of a fraction would result in an Ord atom due to the braces that delimit the subformula, so this classification was dropped altogether to avoid confusion.

Proof of that is can be seen from this example:

\nopagenumbers \loggingall \tracingonline=1
Punct: ${1\over2}.$\par
Ord: ${1\over2}x$
\end

If the fractions were actually an Inner atom, according to the math spacing table on page 170 of The TeXbook, you should have a \thinmuskip after the fractions in both cases (followed by a Punct and followerd by an ord), but if you look at the produced lists, you see that neither have the space:

.......\sevenrm 2
.....\hbox(0.0+0.0)x1.2, shifted -2.5
...\teni :

and

.......\sevenrm 2
.....\hbox(0.0+0.0)x1.2, shifted -2.5
...\teni x

Q957. Reset |\newlinechar| before logging the stats (Udo Wermuth). @1333,1335

Udo Wermuth reported that you could get some weird terminal output from TeX depending on how you set the \newlinechar parameter. For example running

$ tex '\newlinechar=32 \end'

would print

$ tex '\newlinechar=32 \end'
This is TeX, Version 3.14159265 (TeX Live 2020) (preloaded format=tex)
No
pages
of
output.
Transcript
written
on
texput.log.

because it would make TeX use the space character (ASCII 32) as a newline character when writing to a file, so all spaces would be converted. Equally interesting outputs could be achieved by ussing different ASCII codes. This is now corrected, and the command above will generate the much boring

$ tex '\newlinechar=32 \end'
This is TeX, Version 3.141592653 (TeX Live 2021/dev) (INITEX)
No pages of output.
Transcript written on texput.log.

The relevant change entry is:

440. Normalize newlinechar when printing the final stats
(Udo Wermuth, 29 November 2020)
@x module 1333
begin @<Finish the extensions@>;
@y
begin @<Finish the extensions@>; new_line_char:=-1;
@z
@x module 1335
begin c:=cur_chr;
@y
begin c:=cur_chr; if c<>1 then new_line_char:=-1;
@z

Missing (\tabskip) indication in Underfull message

Another bug, found by Igor Liferenko was a missing \tabskip glue indication in the Underfull box message of an alignment, that was otherwise documented in The TeXbook. In this example:

% \catcode`\{=1 \catcode`\}=2 \catcode`\&=4 \catcode`\#=6
\showboxdepth=1 \tracingonline=1
\tabskip=0pt plus10pt \halign to200pt{&#\hfil\cr
  \hbox to50pt{}&\hbox to60pt{}\cr}
\end

the terminal would show

\hbox(0.0+0.0)x200.0, glue set 3.0
.\glue(\tabskip) 0.0 plus 10.0
.\unsetbox(0.0+0.0)x50.0
.\glue(\tabskip) 0.0 plus 10.0
.\unsetbox(0.0+0.0)x60.0
.\glue 0.0 plus 10.0

whereas the last line should be

.\glue(\tabskip) 0.0 plus 10.0

Now the final (\tabskip) indication now shows properly, as documented.

The relevant change entry is:

438. Properly identify tabskip glue when tracing repeated templates
(Igor Liferenko, 10 January 2020)
@x module 793
link(p):=new_glue(glue_ptr(cur_loop));
@y
link(p):=new_glue(glue_ptr(cur_loop));
subtype(link(p)):=tab_skip_code+1;
@z

Internal variable overflow in \hyphenation

This is probably the least exciting bug found, mainly because with your everyday TeX you can't spot it. It was found by David Fuchs with a special version of TeX that he crafted to check memory boundary violations.

When declaring a \hyphenation, the variable hn, declared as a small_number (within the range 0..63) would be maxed out, but then in module §930, <Look for the word |hc[1..hn]|...>, that looks for a given word in TeX's exception table, would call incr(hn), making it larger than the declared size. This document does that:

\lefthyphenmin=0
\righthyphenmin=0
\hyphenation{-a-b-c-d-e-f-g-h-i-j-k-l-m-n-o-p-q-r-s-t-u-v-w-x-y-z-a-b-%
  c-d-e-f-g-h-i-j-k-l-m-n-o-p-q-r-s-t-u-v-w-x-y-z-a-b-c-d-e-f-g-h-i-j-%
  k-l-m-n-o-p-q-r-s-t-u-v-w-x-y-z}
\showhyphens{abcdefghijklmnopqrstuvwxyzabcdefg%
  hijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz}
\end

but it's not noticeable when running TeX because the variable overflow doesn't appear. What exactly happens is a bit implementation dependent, as it relies on what the compiler translates a small_number to. If it becomes a variable that holds more than 0..63, nothing bad will happen.

The relevant change entry is:

439. Use the correct range for local variable hn (DRF, 31 October 2020)
@x module 892
@!hn:small_number; {the number of positions occupied in |hc|}
@y
@!hn:0..64; {the number of positions occupied in |hc|;
                                  not always a |small_number|}
@z

Conclusion

Of course, the one change (and very likely the only one) people will see is:

-@d banner=='This is TeX, Version 3.14159265' {printed when \TeX\ starts}
+@d banner=='This is TeX, Version 3.141592653' {printed when \TeX\ starts}

:-)

As usual, there were several other changes to all of Knuth's distribution including, but not limited to, Metafont (some changes that were ported from TeX due to similar parts of the code), changes to The TeX and Metafont books, and changes to the plain format (everything really minimal and unlikely to bite any reasonable user document).

There is also a TUGboat article by Don describing the major changes to TeX and Metafont, available in the TUGboat web page.


Disclaimer

Most of the code examples in this answer are not my own, but taken from the original bug reports (some slightly modified), so thanks to the authors of those bug reports, and thanks also to the dedicated people that searched for bugs, and to those who read each and every one of the (probably thousands of) bug reports in the last 7 years. Let's prepare for another 8 of those!

Related Question