[Tex/LaTex] Book on a Single (Poster) Page

posters

There are commercially available posters which give the whole text of a book all at once, with some clever text flowing, like this one:hobbit poster

Supposing I had the text of the book Kafka's Metamorphosis and a picture to outline like:

enter image description here

How could I do this in LaTeX?

Best Answer

This question is a great challenge!

full2.pdf

As TeX is a powerful and extensible system (especially with LuaTeX), this is possible (as above).

Disclaimer

But before describing a solution, let me first dissuade one from using TeX for this job:

  • This sort of thing is not what TeX and LaTeX were designed for (beautiful books and structured documents, respectively). So when you use (La)TeX for this purpose, you'll be fighting the system to some extent. (For example, the stretchable vertical glue that TeX inserts between paragraphs is good for producing flush-buttom pages with decent pagebreaks avoiding widows and orphans, but it makes it hard to know where a certain paragraph will fall on the page. Similarly TeX's feature of discarding any glue at the start of a page makes sense for books, but in this case we have to find workarounds that suppress that feature.) So it's better to use some other system, one which is designed with different sorts of page layout applications in mind.

  • For an intriguing example of "another system", that is still TeX-based, consider Speedata Publisher, by Patrick Gundlach aka topskip. (See examples and GitHub.) As I was writing the LuaTeX code below and searching for related code online, this came up often. It seems to be a more “production-quality” system compared to the exploratory attempts of mine below. (I have not used it and don't know whether it has an easy way of accomplishing the particular task in the question; just saying that there exist other systems and not everything has to be done with TeX.)

  • For truly pleasing results for this job, you need good design. That is what Spineless Classics (the makers of the poster in the question) have done, and why, if you're interested in having such posters, their posters are worth buying. (For example, their poster of Alice typesets The Mouse's Tale the way it should be.) Even if you want to make the poster on your own, you probably want to use a page layout program with a graphical interface, which makes it easier to do design work (such as making incremental changes and getting immediate visual feedback). See the answer by RobtAll for the experience you can expect with such software.

Nevertheless, the question of how to do it from within an existing TeX system is an interesting one, and that's what I've answered below.


Usage

With the code below and with ImageMagick installed, this is how we can generate the typesetting above.

  1. First, to determine approximately the number of pages we will need, take our input file (here we can just take pg5200.txt after removing some extraneous stuff, especially the line containing “EBook #5200” which confuses TeX because of the category code of #) and typeset it.

    \input pg5200.txt
    \bye
    

    In this case, it comes out to about 25 pages. This suggests a layout of 5x5 pages, and considering some slack for the blank area (the image cutout) we shall use 5x6. (We could also use 6x5 which would fill the A0 page better.)

  2. Download the image for the cutout. In this case, I'll use x8se9.jpg from the question.

  3. Next, typeset the input file with LuaTeX. As we don't need any special features of LaTeX except ltluatex, we can just use plain TeX to keep it simple: compile the following with luatex -shell-escape kafka.tex

    \input{ltluatex} % For luatexbase.add_to_callback
    \directlua{dofile('pages-cutout.lua')}
    \directlua{pagesWithCutout('pg5200.txt', 5, 6, 'x8se9.jpg')}
    \bye
    

    We could use LaTeX if you prefer, though I haven't tested it with fancy LaTeX documents: compile the following with lualatex -shell-escape kafka.tex:

    \documentclass{article}
    \usepackage[margin=1in]{geometry}
    \begin{document}
    \directlua{dofile('pages-cutout.lua')}
    \directlua{pagesWithCutout('pg5200.txt', 5, 6, 'x8se9.jpg')}
    \end{document}
    
  4. Finally, arrange the pages of kafka.pdf together, using pdfpages:

    \documentclass{article}
    \usepackage[a0paper,margin=0cm]{geometry}
    \usepackage{pdfpages}
    \begin{document}
    \includepdf[pages=-,nup=6x5,column,delta=-3cm -4cm]{kafka.pdf}
    \end{document}
    

    This you could compile with any engine (lualatex or xelatex or pdflatex). The result is as in the image above.

Clearly all the magic happens in Step 3 in pages-cutout.lua, so the rest of this answer gives the code and explains it.


History of paragraph shapes in TeX

(Feel free to skip this section if you're not interested in history.)

TeX itself has a basic \parshape feature which can be used with a paragraph, to specify the width and indentation of each line in that paragraph.

In TUGboat 8:1, April 1987, Donald Knuth and Alan Hoenig simultaneously published Problem for a Saturday Morning  and TeX Does Windows: A Progress Report respectively. The problem was the same: typesetting a paragraph that contains a rectangular “hole” or “window”. In the next issue (TUGboat 8:2, July 1987), the solutions were published.

Knuth's one-page solution was to typeset the paragraph with a certain parshape (left-aligned and right-aligned short lines), then set the page height (\vsize) really small so that the output routine would get called after each line, and redefine the output routine (\output) so that instead of shipping out the pages, it would first collect these line-height “pages” in boxes and finally typeset them with \vskip-\baselineskip in the appropriate places for overlap. (With LuaTeX we don't need this trick of invoking \output; we can just use callbacks like post_linebreak_filter.)

Hoenig's solution comes with a useful illustration of a “before” and “after”:

Hoenig's solution

His solution was to use \parshape as well, to generate the top paragraph in the figure above, then use \vsplit to pull out the top (“lintel”), middle (“sides”) and bottom (“sill”) parts of the paragraph around the “window”, then again repeatedly \vsplit to \baselineskip on the “sides” to get the individual lines into boxes that are assembled together. This solution became the basis of the cutwin package.

Then there is the shapepar package. Its documentation does not describe the ideas that go into it, and I have not read shapepar.sty in enough detail to understand anything beyond the fact that it too uses \parshape. This package is really cool and sophisticated, and works with both LaTeX and plain TeX. I seriously considered using it for the typesetting of individual paragraphs in this problem. But I wasn't able to figure out how to translate an arbitrary image (or pattern of scan lines) into its shape specification (I'm sure it's possible though), so I gave up and wrote my own implementation.


Outline of code

(Personally I think ideas are more important than code, so I see problems with the TeX/LaTeX community's approach of often providing only “packaged” solutions for end-users, without more effort towards sharing the knowledge/tricks that would help others learn from and build on the solutions. So I'll try to explain everything below.)

The core idea is the following:

  • For each paragraph, we need to typeset that paragraph with the appropriate “shape”. (Given the shape, this becomes the problem mentioned in the History section above.) How can we determine this shape?

  • We can determine the shape for a paragraph when we reach the start of that paragraph, by knowing the overall image, the number of rows and columns (of pages) that the original image is split into, what page this paragraph falls on, and where this paragraph falls on the page. (That is, each paragraph's shape is determined as an offset into the global image.) The paragraph's position on the page is given by how much has already been put on the page, which is available in the TeX dimension pagetotal.

  • What does it mean to determine an image, for our purposes? What we ultimately want, when typesetting a paragraph, is to know for each line where text goes and where the “holes” go. This means quantizing the original image into binary (black-and-white / holes-and-text) runs of lengths, with height that of the lines (\baselineskip). For good fidelity to the image we could do this at the start of each paragraph, but for speed we could do it once at the beginning (assuming a constant line height say), and then index the paragraphs as offsets into the global image data.


Example

To get the paragraph with the following shape:

shaped par

we first determine the proper shape, use appropriate parshape to line-break as follows, then insert negative glue so that the lines overlap:

shaped par before negative glue


Code walkthrough / detail

(The code snippets below are pages-cutout.lua with some stuff removed; the whole code as a single file is linked later below.)

The position of a paragraph

At the start of a paragraph (say in pre_linebreak_filter or linebreak_filter), we can get the position of a page as follows. By convention the page number is available in \count0, directly accessible in LuaTeX as tex.count[0]. The height of material already added to the page (before this paragraph) is available in the TeX dimension \pagetotal (or tex.pagetotal). (The other dimensions in the family — pagefilstretch, pagefillstretch, pagefilllstretch, pageshrink, pagedepth and pagegoal — may be of interest too, if we have stretchable/shrinkable vertical page on the page, which we'll avoid here.)

In terms of the original image chosen for the cutout, we'd like to know the region of the image starting from where the paragraph starts in the image, to the end of the current page. If our page occurs on row r (of R) and column c (of C) among the total pages, then, given the pagetotal t out of the page's vertical height (vsize) v, the fraction of the image we are interested in is the rectangle with the following corners (in (x,y) coordinates with (0,0) at the top-left corner and y increasing down the page):

((c-1)/C, (r - 1 + t/v)/R)               (c/C, (r - 1 + t/v)/R)
                             . . .
((c-1)/C, r/R)                           (c/C, r/R)

These are the coordinates relative to the total area of the image. In absolute terms, if we scale the image so that 1 pixel is the height h of one line of text (which can itself vary, but let's take the line height as of the start of this paragraph), then that's a scaling factor of (v / h) * R (the total number of lines) divided by the image's original height in pixels.

(As a paragraph can span multiple pages, we need to include the next page as well. If you have paragraphs that span even more pages, you should include those too.)

local utils = require('utils.lua')
global = {}

-- Rest of current page + next page
function image_areas_for_paragraph(resized_image_height)
   local area1 = image_offsets(tex.count[0], tex.pagetotal / tex.vsize, resized_image_height)
   local area2 = image_offsets(tex.count[0] + 1, 0, resized_image_height)
   return area1, area2
end

-- Which row and column a given page number falls on.
local function row_column(page_number)
   local current_column = math.ceil(page_number / global.num_rows)
   local current_row = page_number - (current_column - 1) * global.num_rows
   return current_row, current_column
end

-- The crop area, for this page number and page-filled fraction
function image_offsets(page_number, f, resized_image_height)
   local r, c = row_column(page_number)
   if r > global.num_rows or c > global.num_columns then
      return nil
   end
   local C = global.num_columns
   local R = global.num_rows
   local x1 = (c - 1)/C
   local y1 = (r - 1 + f)/R
   local x2 = c/C
   local y2 = r/R
   local image_resize_ratio = resized_image_height / global.image_height
   local resized_image_width = image_resize_ratio * global.image_width
   local x_start = math.floor(x1 * resized_image_width + 0.5) -- The left edge of the “page” (text area) starts here
   local y_start = math.floor(y1 * resized_image_height + 0.5)
   local x_end = math.floor(x2 * resized_image_width + 0.5) -- The right edge of the “page” (text area) ends here
   local y_end = math.floor(y2 * resized_image_height + 0.5)
   if y_start >= resized_image_height then
      return nil
   end
   local offset_string = string.format("%dx%d+%d+%d", x_end - x_start + 1, y_end - y_start + 1, x_start, y_start)
   return offset_string
end

Getting the image in a usable form

Now we know the region of the image we're interested in, and even how to scale it so that one row of pixels corresponds to one line of text, it remains to convert this region of the image into binary data (saying where text can go and where whitespace must go) usable from inside LuaTeX. Our secret weapon here is the portable bitmap format (PBM). The original image, converted to PBM with something like:

convert x8se9.jpg -compress none bw.pbm

(here convert is ImageMagick) produces a plain-ASCII file that can both be viewed in an image viewer:

default bw.pbm

and be viewed in a text(!) editor (click on the image to view it full-size):

Screenshot of Emacs

So as you can see, the really simple format of the PBM file—just 0s and 1s—makes it rather easy to “understand” the image from Lua code.

There's actually another subtlety here, which took me some time to figure out. If for each paragraph you run the above convert independently, then it will convert each region of the image into black-and-white based on what seems best just for that region. This inconsistency can lead to poor results for the image as a whole (certain “white” areas can become black simply because they are locally darker than the surrounding white pixels). So it's better to use a fixed threshold; I used convert x8se9.jpg -resize x265 -threshold "85%" bw.pbm to get the following:

thresholded bw.pbm

(Of course if you already have the original image in a good black-and-white form, you don't need any of this.)

--[======================================================================[
   Using an area of image, translate into "runs" of 0s and 1s
--]======================================================================]
function get_runs()
   local base_filename = 'tmp-for-paragraph.pbm'
   -- We want to scale the image so that a height of \baselineskip is 1 pixel
   -- => total number of rows of pixels should be: (vsize/baselineskip)*(num_rows)
   local resized_image_height = math.floor(tex.vsize / tex.baselineskip.width) * global.num_rows
   local area1, area2 = image_areas_for_paragraph(resized_image_height)
   local filenames = {}
   local command =
      function(area, filename)
         return string.format(
                    [[convert "%s" -resize "x%s" -crop "%s" -threshold "85%%" -compress none "%s"]],
                    global.image_filename, resized_image_height, area, utils.safe_filename(filename))
      end
   if area1 ~= nil then
      local filename1 = '1' .. base_filename
      print(command(area1, filename1))
      os.execute(command(area1, filename1))
      table.insert(filenames, filename1)
   end
   if area2 ~= nil then
      local filename2 = '2' .. base_filename
      print(command(area2, filename2))
      os.execute(command(area2, filename2))
      table.insert(filenames, filename2)
   end
   local ret = {}
   for unused_filename_number, filename in ipairs(filenames) do
      local line_number = 0
      for line in io.lines(filename) do
         line_number = line_number + 1
         if line_number > 2 then  -- Exclude first two header lines
            local runs = {}
            local char = '0'  -- 0 is white in image, which means text in paragraph
            local run_length = 0
            for c in string.gmatch(line, '%d') do
               if c == char then
                  run_length = run_length + 1
               else
                  if char == '1' then run_length = -run_length end -- Black pixels are glue, negative.
                  table.insert(runs, run_length)
                  char = c
                  run_length = 1
               end
            end
            -- The leftover last run
            if char == '1' then run_length = -run_length end
            table.insert(runs, run_length)
            table.insert(ret, runs)
         end
      end
   end
   collectgarbage()  -- https://tex.stackexchange.com/a/404623/48
   return ret
end

Translating image data into a parshape

Now that we have the 0s and 1s of the image translated into “runs” of lengths of text and no-text, we need to feed that into TeX as a parshape. (A refinement here is to discard any run of “text” that occupies a very tiny fraction of the line, because we don't want text that's just a letter's width, say.) For each run of text, we create in our parshape a “line” of that length, with indentation equal to the sum of everything that came before. (Similar to Knuth's solution to the window problem: see the image in section “Example” above.) Later we are going to put these boxes together. Note: Here we also need to maintain data about whether a particular line should be typeset on top of (overlap) the previous one, and whether this line should be preceded by “blank lines” (if so how many). As a bit of a hack, I keep this data in the third element of the array, though it would be better to keep it separate.

--[======================================================================[
   Translate runs into a parshape, avoiding tiny runs of text
--]======================================================================]
function get_paragraph_spec()
   local runs = get_runs()
   runs = clean_runs(runs, 0.02)
   local ret = runs_to_parshape(runs)
   return ret
end

-- Removes all text (positive numbers) that have width less than min_frac of the total length of the line.
-- This function can probably be simplified, as it looks like a lot of code for something so simple.
function clean_runs(runs, min_frac)
   local ret = {}
   for i, linespec in ipairs(runs) do
      local linesum = 0
      for j, elemspec in ipairs(linespec) do linesum = linesum + math.abs(elemspec) end
      local newlinespec = {}
      local add_to_next_glue = 0
      for j, elemspec in ipairs(linespec) do
         if elemspec > 0 then
            if elemspec / linesum < min_frac then
               if #newlinespec > 0 then
                  newlinespec[#newlinespec] = newlinespec[#newlinespec] - elemspec
               else
                  add_to_next_glue = add_to_next_glue + elemspec
               end
            else
               table.insert(newlinespec, elemspec)
            end
         else
            elemspec = elemspec - add_to_next_glue
            add_to_next_glue = 0
            if #newlinespec > 0 and newlinespec[#newlinespec] <= 0 then
               newlinespec[#newlinespec] = newlinespec[#newlinespec] + elemspec
            else
               table.insert(newlinespec, elemspec)
            end
         end
      end
      table.insert(ret, newlinespec)
   end
   return ret
end

function runs_to_parshape(runs)
   --[[
      Example: for a paragraph shaped like
          aaaaaaaaaaaaaa       aaaaaaaaaaaaaa
          bbb     bbbbbb   bbbbbbb   bbbbbbbb
          ccccccccccccccccccccc              
                          ddddd              
       and therefore input (`runs`) like:
           {
              {14, -7, 14},
              {3, -5, 6, -3, 7, -3, 8},
              {21, -14},
              {-16, 5, -14},
           }
      this function returns
   {
      {0            , hsize * 14/35},
      {hsize * 21/35, hsize * 14/35},
      {0            , hsize *  3/35},
      {hsize *  8/35, hsize *  6/35},
      {hsize * 17/35, hsize *  7/35},
      {hsize * 27/35, hsize *  8/35},
      {0            , hsize * 21/35},
      {hsize * 16/35, hsize *  5/35},
      {0, hsize},
   }
   --]]
   local hsize = tex.hsize
   local myparshape = {{hsize, 0, 0}}
   local prev_baselineskip_glue = 0 -- How many multiples of baselineskip to add before a line
   for i, linespec in ipairs(runs) do
      local linesum = 0
      for j, elemspec in ipairs(linespec) do linesum = linesum + math.abs(elemspec) end
      local cursum = 0
      if prev_baselineskip_glue < 0 then
         prev_baselineskip_glue = 0
      end
      for j, elemspec in ipairs(linespec) do
         if elemspec > 0 then
            table.insert(myparshape, {hsize * cursum / linesum, hsize * elemspec / linesum, prev_baselineskip_glue})
            prev_baselineskip_glue = -1 -- Because after the first line, we need to add a negative glue each time
         end
         cursum = cursum + math.abs(elemspec)
      end
      if prev_baselineskip_glue ~= -1 then
         -- No text has been added so this line is fully glue, which means the next line must be preceded by \baselineskip
         assert(cursum == linesum)
         myparshape[#myparshape][3] = myparshape[#myparshape][3] + 1
      end
   end
   table.insert(myparshape, {0, hsize, 0})
   return myparshape
end

Putting it all together

The rest of the code. :-) Here we tweak some TeX parameters and set up a linebreak_filter which LuaTeX calls to break any paragraph into lines. This filter uses all the code from previous sections above to get the parshape, then uses TeX's default line-breaking (tex.linebreak) with that parshape. Then, it inserts the appropriate glue (negative or positive) so that each “line” is at the right offset from the previous one. Note that when we want “blank lines” (either at the top of a paragraph or inside) we should be careful to insert something nondiscardable (here, a rule) before it, so that TeX doesn't discard this glue (at page breaks, say). This took some time to figure out.

--[======================================================================[
   Putting it all together
--]======================================================================]
-- The main function / “interface” to this code.
function pagesWithCutout(text_filename, num_rows, num_columns, image_filename, is_latex)
   image_filename = utils.safe_filename(image_filename)
   global.image_filename = image_filename
   global.image_width  = tonumber(utils.get_output('identify -format "%w" ' .. image_filename))
   global.image_height = tonumber(utils.get_output('identify -format "%h" ' .. image_filename))
   global.num_rows = num_rows
   global.num_columns = num_columns
   local setup = nil
   if is_latex then setup = [[\pagestyle{empty}]] else setup = [[\nopagenumbers]] end -- Turn off page numbers
   setup = setup .. [[\parskip=5pt \raggedbottom]] -- So that inter-paragraph glue stretch does not cause problems
   setup = setup .. [[\hyphenpenalty=0 \lefthyphenmin=1 \righthyphenmin=1 \tolerance=9999 \emergencystretch=3em ]] -- Avoiding overfull boxes as much as possible
   setup = setup .. [[\overfullrule=0pt\relax ]] -- For the few overfull boxes that do happen
   tex.print(setup)
   luatexbase.add_to_callback('linebreak_filter', shape_paragraph, 'Typeset each paragraph according to the "shape" from image.')
end

-- A linebreak_filter: For a given paragraph, determines the required shape and typesets accordingly.
function shape_paragraph(head, is_display)
   local myparshape = get_paragraph_spec()
   local leading_glue = table.remove(myparshape, 1)
   local broken, info = tex.linebreak(head, {parshape=myparshape})
   tex.prevdepth = info.prevdepth -- https://tex.stackexchange.com/q/403801/48
   tex.prevgraf = info.prevgraf
   -- Insert proper glue (negative `baselineskip`s) so that the lines overlap as they should.
   local tmp = broken
   -- First insert the leading glue
   local tmp = utils.find_first_of_type_in(broken, 'hlist')
   assert(tmp ~= nil, 'Empty paragraph? Nowhere to insert this glue')
   broken = insert_nondiscardable_glue_before(broken, tmp, tex.baselineskip.width, leading_glue[3])
   -- Next insert the rest of the glue
   for i, linespec in ipairs(myparshape) do
      tmp = utils.find_first_of_type_in(tmp, 'hlist')
      if tmp == nil then break end
      -- Insert `linespec[3]` number of baselineskip glue before `tmp`
      broken = insert_nondiscardable_glue_before(broken, tmp, tex.baselineskip.width, linespec[3])
      tmp = tmp.next
   end
   return broken
end

function insert_nondiscardable_glue_before(head, tmp, glue_width, times)
   for i = 1, math.abs(times) do
      local my_glue = node.new('glue')
      node.setglue(my_glue, glue_width * utils.sign(times))
      head = node.insert_before(head, tmp, my_glue)
      local rule = node.new('rule')
      rule.height = 0
      rule.depth = 0
      rule.width = 0
      rule.subtype = 1  -- box (see LuaTeX manual)
      head = node.insert_before(head, my_glue, rule)
   end
   return head
end

The above is (roughly) pages-cutout.lua, and it uses some utility functions that I moved to a separate utils.lua. It won't fit within the 30000-character limit for this answer so I've put the two files together here.


Possibly relevant

While doing all this, I came across some other potentially useful/relevant sources of LuaTeX stuff that I haven't yet looked into: LuaTeX-ja. michal-h21/linebreaking. I noticed Speedata Publisher uses Imageshaper (invoking ImageMagick) for solving a similar problem. Some more assorted LuaTeX goodies: TUGboat, Douglas, traversing, visualization, nodetree, post_linebreak_filter, nodes back to text, TUGboat, …


Final words

Here's the result using LaTeX not plain TeX, and with pages in a 6x5 layout instead of 5x6:

full3.pdf

Note: All this is tested only for this text+image; remaining bugs/enhancements (those stray words in the belly of the insect, preserving aspect ratio of image, accounting for margins, can think of many more) are left to the ambitious reader. :-)

Related Question