[Tex/LaTex] TeX’s algorithms for “breaking paragraph into lines” and “setting the glue”

tex-core

I want to learn more about the implementation of TeX's text-justification algorithm (setting the glue).

After scanning through ch.12 of Knuth's TeXbook, I'm guessing that setting the glue must consist in a constrained linear program

  • whose variables represent the interword glue, and
  • whose constraints correspond to shrinks & stretches allowed for each interword glue.

However, if my intuition is correct, the linear program in question must get prohibitively large pretty quickly as the length of the document increases, at least, for computers back in the day. Does setting the glue use some heuristic instead?

EDIT: Thanks to egreg for his answer. However, I realise I should have included the paragraph-making algorithm in my question.

I'm still not quite clear on the interplay between

  • the paragraph-making algorithm (which, I presume, determines where line breaks occur), and
  • the text-justification algorithm (which determines the interword glue).

It seems to me that the two problems are not independent. Does the paragraph-making algorithm, in the process of generating line break after line break, not use information about each glue item (natural width, stretch and shrink components), too?

Best Answer

There's no linear programming involved.

TeX stores items in lists (vertical, horizontal or math lists). Math lists are converted to horizontal lists, so only the first two must be discussed, but they are alike when setting the glue is involved.

Glue is stored as a glue item, represented with its natural width, stretch component and shrink component.

When TeX has to build a box having a certain width from a horizontal list or a certain height from a vertical list (lines in a paragraph or a page, for instance), it sums up all the natural widths, adds the width of the non-glue items and computes how much the glue has to be stretched or shrinked.

If stretching s is required and t total stretch is available, TeX computes the "stretch ratio" r = s/t; and if a glue item has x stretch component, TeX adds rx to the natural width.

The situation is similar for shrinking, with the difference that TeX will never go below the stated shrinking and will output an overfull box if the total shrinking available is not sufficient.

There is a little complication because stretching or shrinking can be expressed in "infinite units". Well, the computation of the total stretching or shrinking available keeps the higher order of infinity resulting from the computation and adds (or removes) from the natural width only to the glue items which match that order of infinity in their specification.

These are very easy and fast operations on any computer.


The paragraph making algorithm takes into account stretchability and shrinkability of the glue items in the horizontal list it has built only in an indirect way.

Glue items mark feasible line break points, provided they are preceded by a non-discardable item (also other items mark feasible breaks). During this phase TeX looks at the natural widths of the items and, to put it simply, chooses a sequence of break points such that no line will exceed the current \hsize.

Actually this choice is based on many parameters, on penalties found in the list (or implicit penalties) and on computed demerits. Possible break points are also discretionary hyphens that may have been automatically inserted during the process.

For each feasible break point, TeX computes the glue ratio for the line that would result, which assigns a badness to the line that will be used in the computation of the total demerits. However glue is not set at this point, but only when TeX has chosen the best sequence of break points (the best according to its rules and the values of the various involved parameters): the lines are put in hboxes of width \hsize, essentially \hbox to \hsize{...} and now the glue is set as explained above.