TeX Core – Understanding Boxes at the Primitive Level in TeX

boxestex-core

I'm learning plain TeX as a nerdy hobby. My current project involves trying to divide the page into unequal columns. I only want to use boxes and glue, with no external dependencies, e.g. eplain. Before diving the page I wanted to learn how to frame a box, to assist my learning through visualisation. My attempt to visualise the writing area i.e. vsize and hsize produced an overfull hbox. My minimal working example is:

\hrule
\hbox to \hsize{
  \vrule
  \vbox to \vsize{
    \vfill
    \hfill
  }
  \vrule
}
\hrule
\bye

The output was:

Overfull \hbox (5.24442pt too wide) detected at line 9
Overfull \vbox (10.4pt too high)

Can someone please explain to me what is causing the boxes to overflow given that I am only using glue. My hunch is that I am placing my box inside an existing "default" box, which it is overflowing.

I am using ubuntu 22.04 and pdftex --version outputs the following:

pdfTeX 3.141592653-2.6-1.40.22 (TeX Live 2022/dev/Debian)
kpathsea version 6.3.4/dev
Copyright 2021 Han The Thanh (pdfTeX) et al.
There is NO warranty.  Redistribution of this software is
covered by the terms of both the pdfTeX copyright and
the Lesser GNU General Public License.
For more information about these matters, see the file
named COPYING and the pdfTeX source.
Primary author of pdfTeX: Han The Thanh (pdfTeX) et al.
Compiled with libpng 1.6.37; using libpng 1.6.37
Compiled with zlib 1.2.11; using zlib 1.2.11
Compiled with xpdf version 4.03

Thanks.

Best Answer

Your \hbox to\hsize includes space from the the end of line 2, \vrule (width 0.4pt), \vbox (width \hsize because of \hfill inside it), space from the end of line 7, \vrule (width 0.4pt). The sum of this material has its width greater than \hsize by 5.24442pt.

The height of this box is equal to height of \vbox inside it. The first \hrule is inserted to the first baseline (its distance is \topskip=10pt for the upper boundary of the \box255 propagated to the output routine. Then there is the box of \vsize height and then the second \hrule (0.4pt). So the \box255 has its height: 10pt+\vsize+0.4pt. So, the output routine reports \vbox 10.4pt too high.

Related Solutions

[Tex/LaTex] How to disable the TeX primitive $$

This is not really recommended. Anyway, here a technical solution: In this specific case you could change $ to be an active character (i.e. like a macro) which holds an old normal $. Then two $s in a row would not be combined:

\def\mathdollar{$}
\catcode`\$=\active
\let$\mathdollar

All $$ inside previously defined math environments are still OK because they hold the dollars in their original category code.

However, I would recommend the following solution instead: To write your two equations simply write $equation1${}$equation2$ or $equation1 equation2$ . If you don't use $$ in your apparently automatically generated code then it should be safe to search&replace all $$ with ${}$ .

[Tex/LaTex] Unicode math at TeX primitive level

In classical TeX a number of math mode fonts are used to supply the output glyphs based on the input, and as observed in the question the relevant \mathcode of the input token. In contrast, when using a Unicode math mode font only one font is used to supply all of the glyphs. As such, rather than the limited number of slots available in a TeX font there are large number of math mode-specific entries in a Unicode font.

Both Unicode engines (XeTeX and LuaTeX) provide the primitive \Umathcode for setting the extended math codes required for this to work. Details are available in both the XeTeX and LuaTeX manual: the syntax is

\Umathcode ⟨char slot⟩ [=] ⟨math type⟩ ⟨fam.⟩ ⟨glyph slot⟩

Notice that there is a requirement to supply a family here but that these will all be the same.

To set up the font dimensions required for math mode working, the engine or a suitable loader has to read the table supplied by the font. In XeTeX this happens as part of the (extended) \font primitive, for example

\font\lmmx = "[latinmodern-math.otf]/OT:mode=base;script=math;"

whilst in LuaTeX a Lua-based loader is required to extend the \font primitive (which out-of-the-box is identical to that in TeX90) (Realistically the font loader to use with LuaTeX is luaotfload, which is based on that written for ConTeXt but loadable with plain, LaTeX, _etc. There is work ongoing to use the HarfBuzz shaper with LuaTeX but this is not at present usable to my knowledge.)

As only one font is in use, conversion between input and output glyphs requires some differences from classical TeX. For example, input such as

$y = mx + c$

will not give italic letters unless they have the correct \Umathcode to point to the 'correct' codepoint. For example, we need

\Umathcode `\y =  "7 "1 "1D466

(I'm assuming that we will use font 1 for all glyphs: this is not required.)

Operators in Unicode math are scaled by the font shaper directly rather than needing extensible parts. As such, something like \int is defined for Unicode use by

\let\int=∫

with the correct math code then chosen

\Umathcode `∫= "1 "1 `∫

Both XeTeX and LuaTeX have the \Uradical primitive for radicals: LuaTeX also has \Uroot.

An important consequence of using only one font is that for example making symbols bold requires that all of the relevant math codes change. Thus setting up something \bf requires that we map over all code points affected and alter their \Umathcode.

Whilst only one font is required, it is necessary to define math families two and three to satisfy the engine that sufficient math parameters are available. (This may change, certainly in LuaTeX, as it seems to be a hold-over of code paths from TeX90.) At the same time, script fonts need to be loaded telling the loader what they are. This leads to a minimal font loading set up something like

\font\lmmx   = "[latinmodern-math.otf]/OT:mode=base;script=math;" %
\font\lmmvii = "[latinmodern-math.otf]/OT:mode=base;script=math;+ssty=0;" at 7pt %
\font\lmmv   = "[latinmodern-math.otf]/OT:mode=base;script=math;+ssty=1;" at 5pt %
\textfont1 = \lmmx
\textfont2 = \lmmx
\textfont3 = \lmmx
\scriptfont1 = \lmmvii
\scriptfont2 = \lmmvii
\scriptfont3 = \lmmvii
\scriptscriptfont1 = \lmmv
\scriptscriptfont2 = \lmmv
\scriptscriptfont3 = \lmmv

(Again, I am assuming XeTeX font syntax here.)

As noted in comments, there are a large number of additional font dimensions in Unicode math fonts. LuaTeX gives these names (all listed in the LuaTeX manual), whilst for XeTeX they have numbers and are accessed using \fontdimen.

The TeX90 primitives \delimiter, \mathaccent and \radical all have extended Unicode versions: \Udelimiter, \Umathaccent and \Uradical. Unlike the TeX90 versions, \Udelimiter and \Uradical do not need to point to multiple glyph slots: only one slot is needed and the font shaper is responsible for growing the glyph as required. The syntax of \Umathaccent is significantly extended compared to \mathaccent, certainly for LuaTeX. All three primitives are described in the LuaTeX manual and to a lesser extend in the XeTeX one.

Best Answer

Related Solutions

[Tex/LaTex] How to disable the TeX primitive $$

[Tex/LaTex] Unicode math at TeX primitive level

Related Question