[Tex/LaTex] LaTeX Theory – How Symbols are Modeled Under the Hood

symbolsterminologytex-core

This question is about how LaTeX models symbols under the hood. I would like to get an understanding of the language/terminology involved in constructing a complex text expression such as this:

\prod_{\tilde{x}^2_i \in X}^{\infty} \underset{\cdot}{a}^2_{n_r} + \overline{abc}

enter image description here

Wondering what the terminology is for symbols that can have an upper/lower portion like the \prod, as well as for surrounding an element with under/on-top and to the sides like a in the equation. There is grouping going on, some nesting. Also I noticed that \bar only goes over the middle letter in abc, while overbar goes over all of them, I imagine this is based on some concepts of letters/groups somehow. Also wondering if there are concepts for letter groups like the \overline{abc} as a single unit, vs. the + \overline{abc} where the + is considered a separate unit. Seems like LaTeX has this figured out internally, to handle the spacing differently and such.

In addition, it seems that some cases LaTeX users just manually draw shapes to place lines in the appropriate spots, so there is no standardization there. But basically looking to what extent standardization of the terminology exists.

Best Answer

TeX knows thirteen kind of atoms in math formulas and build upon them, just like any formula in mathematics is built upon atomic ones.

The atoms are Ord, Op, Rel, Bin, Open, Close, Punct, Inner, Over, Under, Acc, Rad and Vcent.

Actually only the first eight are eventually considered, because the last five are converted to Ord ones.

Every atom has three fields: nucleus, subscript and superscript, which in turn can contain other atoms. Again the last five types are special in this account, because only the nucleus makes real sense.

Ord is for “ordinary” symbols such as variables. Op is for “operators” such as \sum or \log. Rel and Bin are for “relation” and “operation” symbols (such as < or +). Open and Close refer to fences such as parentheses. Punct for punctuation signs (the comma or semicolon).

An Inner atom is basically built from \left\right (and contains a subformula). Over results from \overline and Under from \underline. Acc from the primitive \mathaccent that's called by commands such as \bar or \tilde. Rad stems from the \radical primitive, internally used by \sqrt. Vcent is a special object built from \vcenter.

An Op atom can be followed by the commands \displaylimits, \limits or \nolimits; no specification is equivalent to adding \displaylimits: the subscript and superscript fields will be typeset below and above the operator when the formula itself is typeset in display styles (from $$...$$ or, in LaTeX parlance, \[...\] or similar environments) or besides the symbol in the other styles. There are also rules for possibly choosing a bigger version of the symbol in display style.

Any symbol or subformula can be made into an atom by specifying it as argument to \mathord, \mathop, \mathrel, \mathbin, \mathopen, \mathclose, \mathpunct or \mathinner. However \mathord{...} is equivalent to the simpler {...}.

Your particular question is about \bar and \overline. Something like \bar{abc} becomes (temporarily) an Acc atom; the accent is placed above the whole subformula, but has no wider version, so it ends up covering just the b. With \widetilde it is different, because the \mathaccent command points to a glyph that has wider variants (this information is encoded in the font). With \overline{abc}, instead, a rule is drawn above the whole subformula, making a single Over atom (that will be later considered as Ord as far as spacing is concerned).

After the input is processed assigning atom types according to internal tables that assign \sum to being Op, = as being Rel and so on, the whole math list so obtained is reprocessed in order to add the suitable math spacings after transforming Over, Under, Acc, Rad and Vcent atoms to Ord; it is then processed again in order to transform it into “boxes and glue”.

The whole Appendix G in the TeXbook is devoted to the rules for such processing.