The concept of boxes in TeX

boxboxescharacterssymbols

Am I right that TeX considers a text in terms of boxes (rectangles) where each character (char) is in a rectangle with three dimensions (height, width and depth (hwd)) and the origin point?
And that each box contains not a character itself, but the information about it (its ASCII code, etc.)?
So that TeX is to place boxes correctly by their metrics (geometry) and build a dvi-file where it is explained where to put a character, what character it is and its size and which font to use? Then a dvi-driver (viewer) is only to visualize the information? So the visualization is divided from?
And that a character, a word, a line, a paragraph and a page are all boxes with the same structure (a rectangle with hwd and the origin)?
So the box concept is recursive like? A subbox is a smaller copy of the parent box? And the smallest part is a box with only one character?

Best Answer

Abstractly your description is more or less right. Classical TeX doesn't have any information about character shapes from the font. It just deals with font metrics where each character is four numbers (height, depth, width and italic correction), but while the overall model is often known as "boxes-and-glue" within TeX itself, characters form a separate node type, corresponding to horizontal or vertical lists (which as you note can be nested) and while characters and rules are somewhat box like they are not actually boxes.

So \setbox0=\hbox{a} saves a box with a horizontal list containing the character node for a. That character is not itself a box and you can't do \setbox0=a.

Related Solutions

[Tex/LaTex] How to clip a TeX box using low-level PS commands

Both Herbert and Alexander have offered solutions for dvips. Here, I'm taking inspiration from those answers plus the more convenient approach available in pdfTeX, plus a modified version of the pgf method for XeTeX, and combining into a single approach. First, note that I'm assuming e-TeX and also using a somewhat 'LaTeX3-like' programming approach. I've also shared as much code as possible.

I'll use a single example but add comments along the way. First, as there are packages for the driver detections I'll load those.

\documentclass{article}
\usepackage{ifpdf,ifxetex}
\makeatletter

The main internal macro takes five arguments: the box to modify, then four dimension expressions to be clipped off the left, bottom, right and top respectively. You could also set up a similar approach to take a final size. The idea here is that the baseline is respected if possible, so there is a bit of care needed with the vertical placement so that the content only moves down while there is some depth available.

\protected\long\def\box@clip#1#2#3#4#5%
  {%
    \ht#1\dimexpr\ht#1 - \dimexpr#5\relax\relax
    \ifdim\dp#1>\dimexpr#3\relax
      \dp#1\dimexpr\dp#1 - \dimexpr#3\relax\relax
    \else
      \setbox#1=\hbox
        {\lower\dimexpr\dimexpr#3\relax - \dp#1\relax\box#1}%
      \dp#1\z@
    \fi
    \wd#1\dimexpr\wd#1-\dimexpr#4\relax\relax
    \setbox#1=\hbox
      {%
        \hskip-\dimexpr#2\relax
        \box#1%
      }%
    \ifxetex
      \expandafter\box@clip@xdvipdfmx
    \else
      \ifpdf
        \expandafter\expandafter\expandafter\box@clip@pdfmode
      \else
        \expandafter\expandafter\expandafter\box@clip@dvips
      \fi
    \fi
    #1%
  }

For each driver supported, there is an auxiliary. First, for dvips this is Herbert's method slightly altered (pgf makes things very complex):

\protected\long\def\box@clip@dvips#1%
  {%
    \setbox#1=\hbox
      {%
        \special
          {%
            ps:
              /mtrxc matrix currentmatrix def 
              currentpoint gsave
              translate
              Resolution 72 div VResolution 72 div scale
              0 -\to@bp{\dp#1} neg  \to@bp{\wd#1} \to@bp{\ht#1 + \dp#1} neg
              rectclip
              mtrxc setmatrix 
          }%
        \box#1%
        \special{ps: grestore }%
      }%
  }

As I said, pdfTeX makes life very easy :-)

\protected\long\def\box@clip@pdfmode#1%
  {%
    \pdfxform#1%
    \setbox#1=\hbox{\pdfrefxform\pdflastxform}%
  }

XeTeX is possibly the most complex one to tackle. The pgf approach is used, but here I've removed a lot of unnecessary transformations. After reading the dvipdfmx manual, it becomes clear that the best approach is as follows

\protected\long\def\box@clip@xdvipdfmx#1%
  {%
    \setbox#1=\hbox
      {%

The first special saves the current point and starts a new 'graphics level'. Using the bcontent operation saves the current location automatically.

        \special{pdf:bcontent }%

Draw a rectangle the size of the modified box: in pgf this is done using the lower-level m, l and h operations, but there is no gain in working that way. This will be located at 'current point' TeX-wise.

          \special
            {%
              pdf:literal direct 
                0 -\to@bp{\dp#1} \to@bp{\wd#1} \to@bp{\ht#1 + \dp#1} re 
            }%

The W operation specifies a clip, and n finalises the path without any output (it's a 'no-op').

          \special{pdf:literal direct W }%
          \special{pdf:literal direct n }%

Insert the box and tidy up.

                \box#1%
              \special{pdf:econtent }%
      }%
  }

A simple conversion taken from Is there a command to convert cm to bp?

\long\def\to@bp#1{\strip@pt\dimexpr0.99626\dimexpr#1\relax\relax}

Wrap everything up in a user macro and finish the code block

\protected\long\def\boxclip#1#2#3#4#5{\box@clip#1{#2}{#3}{#4}{#5}}
\makeatother

\newbox\testbox

Now for some testing.

\begin{document}
\setbox\testbox=\hbox{Some test text with (g)}
\boxclip{\testbox}{10 pt}{2pt}{5pt}{2 pt}

\noindent\box\testbox{}

\end{document}

(I'll be adding this to LaTeX3 now I know how it works!) If you want to see the effect of various parts of the XeTeX code, comment out the W line to turn off the clipping. You can also replace the n operations by s so that you get a box where the clipping path is.

In earlier versions of the answer, for XeTeX I used the content q operation to save the current location, but this requires a series of manipulations to get the clip path and the box insert to line up. Using the bcontent ... econtent pair is much clearer.

While using the XForm implementation for pdfTeX is convenient, in a case where code is to be shared between branches an alternative approach is possible. (The above could be viewed as an abuse of the XForm object system in any case).

\protected\long\def\box@clip@pdfmode#1%
  {%
    \setbox#1=\hbox
      {%
        \pdfsave
          \pdfliteral direct
            {%
              0 -\to@bp{\dp#1} \to@bp{\wd#1} \to@bp{\ht#1 + \dp#1} re W n
            }%
          \hbox to 0pt{\copy#1\hss}%
        \pdfrestore
        \hskip \wd#1
      }%
  }

The zero width box here is used to keep placement correct in the \pdfsave/\pdfrestore pair (which perform the same task as the bcontent/econtent pair for XeTeX but which should be places in the same output position.)

[Tex/LaTex] Bounding box for each letter

A LuaTeX solution. Should work in all situations that I am aware of:

\documentclass{article}
\usepackage{luacode,luatexbase}
\begin{document}
\begin{luacode*}
local GLYPH_ID = node.id("glyph")

local number_sp_in_a_pdf_point = 65782

function math.round(num)
  return math.floor(num * 1000 + 0.5) / 1000
end

-- width/height/depth of a glyph and the whatsit node
local wd,ht,dp,w

-- head is a linked list (next/prev entries pointing to the next node)
function showcharbox(head)
  while head do
    if head.id == 0 or head.id == 1 then
      -- a hbox/vbox
      showcharbox(head.list)
    elseif head.id == GLYPH_ID then
      -- Create a pdf_literal node to draw a box with the dimensions
      -- of the glyph
      w = node.new("whatsit","pdf_literal")
      wd = math.round(head.width  / number_sp_in_a_pdf_point)
      ht = math.round(head.height / number_sp_in_a_pdf_point)
      dp = math.round(head.depth  / number_sp_in_a_pdf_point)

      -- draw a dashed line if depth not zero
      if dp ~= 0 then
        w.data = string.format("q 0.2 G 0.1 w 0 %g %g %g re S f [0.2] 0 d 0 0 m %g 0 l S Q",-dp,-wd,dp + ht,-wd)
      else
        w.data = string.format("q 0.2 G 0.1 w 0 %g %g %g re S f Q",-dp,-wd,dp + ht,-wd)
      end
      -- insert this new node after the current glyph and move pointer (head) to
      -- the new whatsit node
      w.next = head.next
      w.prev = head
      head.next = w
      head = w
    end
    head = head.next
  end
  return true
end

luatexbase.add_to_callback("post_linebreak_filter",showcharbox,"showcharbox")
\end{luacode*}

A \emph{wonderful} serenity has taken {\large possession} of my entire soul, like these
\textsl{sweet}
\textbf{mornings} of spring which I enjoy with my whole heart. I am alone, and feel the
charm of existence in this spot, \textbf{which} was created for the bliss of souls like
mine. I am so happy, my dear friend, so absorbed in the exquisite sense of
mere tranquil existence, that I neglect my talents. I should be incapable of
drawing a single stroke at the present moment; and yet I feel that I never was
a greater artist than now.

\end{document}

which yields:

text with boxes

(detail)

detail on text with boxes

Bonus: it draws the base line if the depth of the glyph is not 0.

Here is a solution that replaces the glyphs by black rectangles (rules):

\documentclass{article}
\usepackage{luacode,luatexbase,microtype}
\begin{document}
\begin{luacode*}
local GLYPH_ID = node.id("glyph")

-- head is a linked list (next/prev entries pointing to the next node)
-- parent it the surrounding h/vbox
function showcharbox(head,parent)
  while head do
    if head.id == 0 or head.id == 1 then
      -- a hbox/vbox
      showcharbox(head.list,head)
    elseif head.id == GLYPH_ID then
      r = node.new("rule")
      r.width  = head.width
      r.height = head.height
      r.depth  = head.depth

      -- replace the glyph by
      -- the rule by changing the
      -- pointers of the next/prev
      -- entries of the rule node
      if not head.prev then
        -- first glyph in a list
        parent.list = r
      else
        head.prev.next = r
      end

      if head.next then
        head.next.prev = r
      end
      r.prev = head.prev
      r.next = head.next

      -- now the glyph points to
      -- nowhere and we should remove
      -- it from the memory
      node.free(head)

      head = r
    end
    head = head.next
  end
  return true
end

luatexbase.add_to_callback("post_linebreak_filter",showcharbox,"showcharbox")
\end{luacode*}

\hsize6cm

A wonderful serenity has taken possession of my entire soul, like these sweet
mornings of spring which I enjoy with my whole heart. I am alone, and feel the
charm of existence in this spot, which was created for the bliss of souls like
mine. I am so happy, my dear friend, so absorbed in the exquisite sense of
mere tranquil existence, that I neglect my talents. I should be incapable of
drawing a single stroke at the present moment; and yet I feel that I never was
a greater artist than now.

\end{document}

glyphs replaced by black rules

Best Answer

Related Solutions

[Tex/LaTex] How to clip a TeX box using low-level PS commands

[Tex/LaTex] Bounding box for each letter

Related Question