[Tex/LaTex] Using \big| and \right| versus \bigr\rvert and \right\rvert

best practicesdelimitersmath-modesymbols

Based on the answers to this question (about \bigl, \bigr, \big, etc.) and this question (about \lvert, \rvert, |, etc.), I would think that anytime one used a vert bar to delimit on the right they should write \rvert, \bigr\rvert, etc., or \right\rvert (or \mright\rvert). However, I feel like I am seeing these mixed and matched in some answers. Take for example the excerpt

\right|_{#2} % this is the delimiter

from this answer, or the excerpt

\NewDocumentCommand{\evalat}{sO{\big}mm}{%
  \IfBooleanTF{#1}
   {\mleft. #3 \mright|_{#4}}
   {#3#2|_{#4}}%
}

from this answer. Why is it appropriate to use the different combinations here?

Best Answer

OK, the short answer is in the comments; here is the looong one.

Please note: This answer applies to the original TeX by Knuth, to eTeX, and to pdfTeX. I don’t know how character input in math mode and math fonts are managed in other typesetting engines, e.g., in XeTeX. Moreover, in principle the inputenc package could interfere in the processes described below, and this is briefly discussed at the end.

Already existing answers that give useful information about this subject are, among others:


Character tokens, their math codes, and kinds of atom

As you know, when TeX is reading from the input file a formula to be typeset, the input tokens are processed in “math mode”. Most of these input tokens will be simple characters like “x”, “y”, “+”, “=”, and so on, representing syntatic units of the formula: for example, “x” and “y” are variables, “+” is a binary operation, “=” is a relation, etcetera. Of course, though, there are also many mathematical symbols that cannot be represented by the simple characters used in ordinary text, and are therefore input as control sequences: for example, \sum, \int, \cup, \cap, etc,, but also \langle or \rangle. Obviously, | falls within the first case, while \vert, \lvert, and \rvert within the second.

Now, irrespective of the method used to input it, for each character that TeX must typeset in a formula, it needs to know:

  1. where to take the character from, that is, from which font and from which position within that font;

  2. which kind of syntactic entity the character represents, that is, whether it is a variable, a binary operation, a relation, and so on.

TeX needs the information mentioned in 2 because the spacing between adjacent characters in a formula depends on it: for example, in the formula ax+by=0, no space should be inserted between the variables “a” and “x”, or between “b” and “y”, but, on the contrary, the “+” and the “=” symbols should be separated from the surrounding elements by some amount of space (actually, “=” requires a thicker space than “+”).

Now, the problem arises of how to specify all this information, for each of the possible input tokens, in a flexible and reconfigurable way, so that, for example, the convention that “+” is a binary operator and “=” a relation symbol is not hard-wired in the code of TeX itself. You can already guess that this is not at all a problem for control sequences like \cup, \vert, or \lvert: after all, control sequences can stand for arbitrary “programs”, so you can easily pack as much information as you want inside them (we’ll see the details below). But how is this information specified for simple characters like “x” or “+” (or “|”)?

Answer: by associating to each character a so-called “math code”. TeX maintains internally a table consisting of 256 entries, each of which can hold a 16-bit integer (although, with one exception, only 15-bit values are actually used): for each input character, the integer contained in its associated entry specifies the necessary information. (This is called the \mathcode table, and is very similar to other TeX tables, like the tables of \catcodes, of \sfcodes, of \uccodes, etc., you might already know about.) More precisely, if we represent such an integer as a string of four hexadecimal digits

kfpp

with k varying only between 0 and 7, then:

  • k gives the kind of the symbol being typeset: for example, 0 = ordinary symbol, 1 = large operator (like \sum), 2 = binary operator, and so on (see The TeXbook, p. 154, for the complete list);

  • f specifies the font, through an indirect mechanism that won‘t be discussed here (and in which LaTeX2e’s NFSS plays its part);

  • pp indicates the position within that font.

The details of how this “\mathcode” table is set up and managed are much more complicated in LaTeX2e than they were in the plain TeX format described in the The TeXbook, and even touching them upon here is unfeasible (see the NFSS documentation). For the sake of answering our question, however, it suffices to know that TeX has “some place” to look at when it needs to know what kind of symbol it is about to typeset. Let us recap once more:

  • TeX looks at the \mathcode table only when it is processing a character token in math mode (actually, this statement should be refined, but let us pass over the TeXnicalities);

  • if so, it looks up the corresponding entry in the table, whose contents specify, among other things, the kind of syntactic entity that the character in question represents;

  • an atom of that kind is eventually appended to the current math list, possibly after attaching a superscript or a subscript to it.


Math characters and kinds of atom

In text mode, you can specify a character to be typeset not only by including that literal character in the input, but also by means of the \char primitive, equivalent to LaTeX’s \symbol command. For instance, instead of bubble you might write, in your source file, \char98 u\char98 \char98 le, and obtain exactly the same result. Of course, the \char primitive is actually useful when you need to typeset a “strange” character like “¿”.

In a similar way, in math mode, you can use the \mathchar primitive to specify any math character (or math symbol) you want. But there is an important difference between \char and \mathchar: whilst after \char you specify only an 8-bit number, which gives just the internal code that represents the intended character, after \mathchar a 15-bit integer is expected, which contains exactly the same information that we would find in a \mathcode table entry: a kind, a font family, and a position, specified in exactly the same format. So, for instance,

\matchar"1350

is a (primitive) command, valid only in math mode, that tells TeX to construct an Op[erator] atom (k = 1) containing the character found in font number 3 (f = 3), whatever this means, at position number 80 (pp = 50 hexadecimal). In the usual setting, this turns out to be the ∑ symbol.

Of course, \mathchar commands are never used directly, but through control sequences that have been defined to act as equivalent commands. For example, in the customary setting \sum has been made equivalent to \mathchar"1350, and this explains why typing \sum in your input file causes an Op atom containing the right symbol to be appended to the current math list (with possible sub/superscripts). The point to note, here, is that this time the kind/font/position information is not looked up in a table, as in the case of “bare” character tokens, but comes with the command itself, be it a \mathchar primitive or a higher-level command like \sum.

Now, always assuming that the customary conventions are in force, it turns out that, when used by themselves and not after \left or \big or \biggr or…:

  • \vert is eventually equivalent to \mathchar"026A, so it generates an Ord[inary] atom (k = 0) containing a character found at a certain position in a certain font;

  • \lvert is eventually equivalent to \mathchar"426A, so it generates an Open[ing] atom (k = 4) containing exactly the same character as above;

  • \lvert is eventually equivalent to \mathchar"526A, so it generates a Clos[ing] atom (k = 5) containing again the same character.

Moreover, one can see that the \mathcode associated with the “|” character is "026A too, so that a simple | in the input behaves, at least when it is used by itself and not after \right or \Bigm or \Biggl or…, exactly as \vert.

But this is only a simplified version of the whole story: we shall complete the picture in the next two sections.


Delimiters

Some math symbols, like parentheses and root signs, are expected to grow with the size of the subformula they encompass, and therefore require a special treatment. For these, TeX provides the concepts of “delimiter” and of “radical”, of which only the first is of concern here.

At the primitive level (TeX’s “machine language” level, as it were), TeX treats a symbol as a delimiter only in a few, well-defined cases: after a \left or \right command, and in connection with certain primitive commands that deals with fractions. For example, ( by itself is typeset as a “normal” (i.e., non-delimiter) character, according to the rules detailed above, but \left( causes TeX to treat the parenthesis as a “delimiter”, that is, as a character that can grow. In order to be able to typeset such a growing character, TeX needs more information than it does in the case of a “normal” character, because it needs to know where the different sizes of the delimiter can be found. So, instead of looking at the math code (\mathcode) for that character, TeX searches another of its internal tables, which again contains an entry for each of the 256 possible character codes: each entry holds the so-called “delimiter code” (\delcode) of the associated character, which can be either a negative number, for characters, like “x” or “+”, that should never act as delimiters, or a non-negative 24-bit number, that is, a sequence of six hexadecimal digits

fppgqq

that specify two variants of the glyph in question, using a two-folded version of a convention similar to the one used in the \mathcode table. More precisely, the first three digits (fpp) indicate the font family and the position where the smallest size of the glyph can be found, and the last three (gqq) specify in a similar way where the larger sizes can be found (actually, qq indicate the position within the font of the first larger variant; still larger variants could be available, which can be found starting from the first one by means of information included in the font metric file itself). It is particularly important to note that, in this case, the information about the syntactic nature of the symbol at hand is not included in its \delcode, because it is not needed: TeX already knows this information from the command that caused it to look for a delimiter (\left, \right, or other ones that we are not considering).

Let us further illustrate this concept with an example: When TeX encounters, in the input, the character token | by itself, it treats it as a “normal” symbol: it looks up its \mathcode, finds out that it is, say, "026A, and from the first digit of the \mathcode it learns that the character at hand should be appended to the current math list as (the contents of) an Ord[inary] atom; the other three digits of the \mathcode tell TeX where to find the appropriate glyph. On the other hand, whem TeX encounters the input, say, \left|, it already knows, from the \left command itself, that an opening delimiter is required, and it only has the problem of finding the glyph; for this, and just for this, it looks at the \delcode of the ensuing |, from which it retrieves the necessary information.

This works fine if the delimiter can be specified by means of a character token, as is the case for round parentheses ((, )), square brackets ([, ]), or vertical bars (|); but what about delimiters like curly braces or double vertical bars? These, as you know, are specified through control sequences (\lbrace, \rbrace, \Vert or its synonym \|). Well, all of these control sequences are actually macros that expand to appropriate invocations of another primitive command called \delimiter, which is loosely analogous to \mathchar.


The \delimiter command

The primitive TeX command \delimiter must be followed by a 27-bit unsigned integer, that can be represented as a string of seven hexadecimal digits

kfppgqq

with k varying only between 0 and 7. This command can be used in all places where TeX is looking for a delimiter (that is, after \left, \right, and with some other primitive commands dealing with fractions), and in this case the rightmost six digits tell TeX where to find the glyph for the delimiter in exactly the same way as a \delcode would do. Now,

\left \vert

works precisely in this way: \vert is a macro that expands to \delimiter "026A30C, so the above line expands to

\left \delimiter "026A30C

and TeX knows that it has to construct an opening delimiter (because of the \left command) whose small variant is found in font family 2 (TeX knows which font this is) at position 106, and whose first larger variant is found in font 3, position 12. The question is: what the hell is the digit k provided for?

Well, we all know that we are allowed to use \vert not only after \left or \right (or \bigl, etc.) but also by itself, and in this case it is completely equivalent to a lone |. This is because the \delimiter command is permitted to appear also in places where TeX is not looking for a delimiter; in this case, the last three hexadecimal digits of the ensuing number are dropped, and the command behaves as it had been a \mathchar. In other words, when it does not follow \left, etc.,

\delimiter "kfppgqq

acts exactly as

\mathchar "kfpp

This time, it is from the k digit that TeX learns which kind of atom it has to construct, and this is the reason for providing it. Thus, you can define \vert to expand to \delimiter "026A30C, and this definition will work in every situation.


The answer, at last!

We are finally in a position to answer the question that has been asked. Consider the following input samples:

  • | by itself: TeX looks at the \mathcode associated with |, from which it learns both where to find the relevant glyph and which kind of atom to construct.

  • \left| or \right|: TeX already knows that it has to construct a left (resp., a right) delimiter, and looks at the \delcode of | just in order to learn where to find the necessary glyph(s).

  • \vert by itself: this is a macro which expands to \delimiter "026A30C; in this context, this acts as it had been \mathchar "026A, and the first digit of the number "026A tells TeX which kind of atom to construct (the following three, where to find the glyph, its small variant being always used in this case). So, an Ord[inary] atom (k = 0) is constructed here.

  • \left\vert or \right\vert: the first of these expands to \left \delimiter "026A30C. TeX already knows that a left delimiter is being asked for, and therefore ignores the first digit of the number "026A30C, and uses the remaining digits to learn where to find the glyph(s) it needs. The effect of \right\vert is analogous.

  • \lvert by itself: this is a macro which expands to \delimiter "426A30C; in this context, this acts as it had been \mathchar "426A, and the first digit of the number "426A tells TeX which kind of atom to construct, which, this time, is an Open[ing] atom (k = 4); the following three digits tell TeX where to find the appropriate glyph, its small variant being always used in this case.

  • \left\lvert or \right\lvert: the first of these expands to \left \delimiter "426A30C. TeX already knows that a left delimiter is being asked for, so it ignores the first digit of the number "426A30C, and uses the remaining six just to locate the necessary glyph(s). Similarly for \right\lvert.

  • \rvert by itself: exercise. (Hint: \rvert expands to \delimiter "526A30C; k = 5 means Clos[ing] atom).

  • \left\rvert or \right\rvert: exercise.

From the above we see that (assuming the customary \mathcodes and definitions) \left|, \left\vert, \left\lvert, and even \left\rvert are all exactly the same thing. And the same for \right.


What about \bigl and relatives?

The control sequences \big, \bigl, \bigm, \bigr, \Big, etc. are macros with one argument, each of which manifactures an atom of a predetermined kind that contains a “pseudo-delimiter” (i.e., not a delimiter in TeXnical sense) of a predetermined size; they do so by means of a \left\right construction that encompasses just an empty box of the appropriate size, which is explicitly wrapped by a \mathord, \mathrel, \mathopen, or \mathclose command (\mathord isn’t actually used, because it is “implied by default”). More precisely, as you certainly already know:

  • the \big, \Big, … series generates Ord[inary] atoms;

  • the \bigl, \Bigl, … series generates Open[ing] atoms;

  • the \bigm, \Bigm, … series generates Rel[ation] atoms;

  • the \bigr, \Bigr, … series generates Clos[ing] atoms.

Here, therefore, the argument is used just to locate the intended glyph.


A word about inputenc

For simplicity, in the above description we did not mention the fact that the \mathcode table lookup happens only for character tokens whose category code is either 11 (letter) or 12 (other). Remember that the inputenc package makes characters in positions 128 … 255 active, and that encoding definition files can assign a particular meaning to some of these active character, when used in math mode. Of course, the substitution of such an active character with its meaning happens before the processes described above take place. To make just one example, the file latin1.def contains, among many others, the declaration

\DeclareInputMath{177}{\pm}

so that character number 177 will be always equivalent, when used in math mode, to the control sequence \pm. This substitution is, of course, completely independent of the \mathcode machinery.

Related Question