So I investigated option of using LuaTeX
's node processing callbacks. Best suited is pre_output_filter
which is called when page is ready for the output. I've created simple package, named boxes
, which consists of two files: LaTeX package boxes.sty
and Lua module boxes.lua
.
boxes.sty:
\ProvidesPackage{boxes}
\RequirePackage{luacode}
\RequirePackage{kvoptions}
\DeclareStringOption[eng]{lang}
\DeclareStringOption[72]{resolution}
\DeclareStringOption[75pt]{startx}
\DeclareStringOption[67.5pt]{starty}
\ProcessKeyvalOptions*
\luaexec{%
main_language = "\boxes@lang"
resolution = tonumber("\boxes@resolution")
startx = tex.sp("\boxes@startx")
starty = tex.sp("\boxes@starty")
}
\begin{luacode*}
print("language", main_language)
local boxes = require "boxes"
boxes.resolution = resolution
boxes.startx = startx
boxes.starty = starty
--boxes.set_name()
luatexbase.add_to_callback("pre_output_filter",
function(head,info, size, pack, maxdpth)
local f = font.getfont(font.current())
local fontname = f.psname or f.fullname
local name = string.format("%s.%s.exp%i.box", main_language, fontname,0)
local glyphs = boxes.traverse(head)
if #glyphs > 0 then boxes.save(name, glyphs) end
return head
end, "Save node boxes")
\end{luacode*}
\endinput
this file is simple, important to note are package options:
- lang: processed language
- resolution: I haven't found at which resolution boxes file should be saved and I think it is a good idea to make it configurable. default is 72 ppi.
- startx and starty - I can't find a way how to calculate top left beginning of text block, this really depends on used document class or packages like
geometry
, I hope there is some way to determine it, but I don't know how. so we must set these values by hand, using some experimenting
and now the lua module, boxes.lua
:
local boxes = {}
boxes.resolution = 300 --72
local pt = 2 ^ 16
local uchar = unicode.utf8.char
local total_height = tex.pageheight
local pagebox = tex.pdfpagebox
local baselineskip = tex.baselineskip.width
local function round(num, idp)
local mult = 10^(idp or 0)
return math.floor(num * mult + 0.5) / mult
end
local function make_dimensions(glyph, x, y)
local resolution = boxes.resolution
local bp = (2 ^ 16) / (resolution / 72.27)
local lx = round(x / bp)
local ly = round((total_height - (y + glyph.depth)) / bp)
local rx = round((x + glyph.width) / bp)
local ry = round((total_height - (y - glyph.height)) / bp)
return lx,ly, rx,ry
end
function boxes.traverse(head)
--local set = head.glue_set
--local sign = head.glue_sign
--local order = head.glue_order
local resolution = boxes.resolution
local bp = (2 ^ 16) / (resolution / 72.27)
local glyphs = {}
local i = 0
for n in node.traverse(head) do
print(n.id, n.subtype)
if n.id == 0 then
i = i + 1
local set = n.glue_set
local sign = n.glue_sign
local order = n.glue_order
local height = i * baselineskip
local nhead = n.head
-- y is distance from page top to the current baseline
local y = boxes.starty + height or tex.pdfvorigin + height - 4.5 * (2^16)
local x = boxes.startx or tex.pageleftoffset + 2.5 * (2^16)
for glyph in node.traverse_id(37, nhead) do
local w, h, d = node.dimensions(set, sign, order, nhead, glyph)
local glyph_x = x + w
local lx,ly, rx, ry = make_dimensions(glyph, glyph_x, y)
glyphs[#glyphs+1]={uchar(glyph.char), lx,ly,rx,ry}
end
end
end
return glyphs
end
function boxes.save(name, glyphs)
local f = io.open(name,"w")
for _, line in ipairs(glyphs) do
f:write(table.concat(line,", ").. "\n")
end
f:close()
end
return boxes
the code is really simple: in function boxes.traverse
we process list of line nodes. when we find node with id
0, which is horizontal line, we increase line count and vertical position with \baselineskip
. this works as long as text is simple without more advanced formatting which would cause vertical space bigger than baselineskip. but for this specific purpose we may assume that only plain text without formatting is used.
we then process child list for glyph
nodes and calculate horizontal position with node.dimensions
function:
local w, h, d = node.dimensions(set, sign, order, nhead, glyph)
set
, sign
and order
are used to calculate size of space, because it has
variable width, it may be little bit different on each line. these values are set in parrent hlist
node. w
variable is width from beginning of the line until current glyph.
then we calculate dimensions of the character with function
local function make_dimensions(glyph, x, y)
local resolution = boxes.resolution
local bp = (2 ^ 16) / (resolution / 72.27)
local lx = round(x / bp)
local ly = round((total_height - (y + glyph.depth)) / bp)
local rx = round((x + glyph.width) / bp)
local ry = round((total_height - (y - glyph.height)) / bp)
return lx, ly, rx, ry
end
variables x
and y
are left bottom coordinates. because coordinate system in TeX
begins at top left, but for boxes
format, it starts at bottom left, all vertical dimensions must be mirrored, simply by subtracting calculated y
value from total height of the page. at the end, calculated dimensions are interpolated to the current resolution using division with bp
variable.
\documentclass[fontsize=12,a4paper,%headheight=0.5cm,headsepline,
parskip=half-]{scrartcl}
%\documentclass{article}
\usepackage[resolution=300]{boxes}
%\KOMAoptions{BCOR=0mm,DIV=40}
\usepackage{fontspec}
%\setmainfont{Helvetica-Compressed}
\begin{document}
\typeout{\the\baselineskip}
‘ kit Contacts Carte Type un forme ç~ avant BYW: EN monde 2001 qu'on plan image ZG 23 À+ niveau femmes příliš žluťoučký kůň úpěl ďábelské ódy.
\end{document}
we can visualize boxes with tessboxes command. it needs image in pbm
format, which can be created with pdftoppm
command:
lualatex sample
pdftoppm -mono -freetype yes -aa yes -r 300 sample.pdf > sample.pbm
tessboxes sample.pbm eng.LMRoman12-Regular.exp0.box > output.pbm
Best Answer
DVI is still the primary output format for anyone using pstricks, an extended DVI format is the only output format from xetex as well as being the default (if not most common) output from pdftex, so I think the question in the title is based on a false premise.
What is true is that PDF has pretty much replaced DVI as a distribution format on the web (or before the web, on ftp and email). I think the main issue there is just the ubiquity of a viewer. If you use PDF output, or convert your DVI to PDF just about anyone with any sort of computer will already have a PDF viewer available. If you send someone a DVI file then if they are not already a TeX user they probably can neither read the file nor easily install a DVI viewer without installing an entire TeX distribution.