[Tex/LaTex] Automated Newspaper Layout (with TeX and abroad)

automationnewspaperpage-breakingpositioning

Update: I'd like to revive (and revise) this question a bit because there have been some recent developments, and furthermore I would be happy to encourage some up-to-date discussion.

I'm thinking (from a professional point of view) about fully automatic generation of newspapers from data.

More precisely, the system under consideration would get as input an 'attributed' data stream of articles (subject classification, headers, author info &c, text, images) plus some hints on the way things should be layouted, but only on the level of "lead story", "short message", "weather report".

As output, a complete newspaper would be generated automatically without further user interaction (with a focus on print, not online; i.e. PDF rather than HTML).

Note that I'm not looking for help on how to do this with LaTeX. There won't be technical difficulties with page and article layout using my system DocScape. I'm asking (myself) about the basic algorithm for "geometrically" generating the page layout based on the given content stream. There has to be some 'artificial intelligence' in there to make the newspaper look good also from a professional newspaper editor's point of view.

Of course, any production-quality system would yield a valid answer, including those based on TeX 😉

googling yields some interesting references, but it's hard to distinguish which of them would really lead to an effective implementation. I'm not talking about an academic exercise here but about a real system which would be used by a publisher to produce hundreds of newspapers each week.

There are further interesting references in the area of floorplanning for VLSI layout, but these lack consideration for specific needs of newspapers, of course 😉

Now my questions a bit more precisely:

  1. Does a system like described above effectively exist (it doesn't have to be based on TeX)? I'd be interested in pointers to concrete systems as well as publications about them.
  2. Are there publishers who really use a system like this for making newspapers (online would be interesting as well)?
  3. Has anyone here ever worked with such a system and would care to describe how it's used?
  4. What are the most interesting "scientific" publications on this subject which I should consider when designing such a system myself?

I have seen the question Automatic newspaper creation in LaTeX, but it's got a slightly different focus than mine (what LaTeX tools to use), and unfortunately the discussion there wasn't very intense, yielding no pointers which would help me.

Some Literature

Here I'll add a review of literature I've collected on the subject. Note that I have not read all of it, so if I have misrepresended something, please comment.

  1. Schoon, Benjamin Durant
    Fishpaper : automatic personalized newspaper layout
    Thesis (B.S.)-Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1994.

    More of a historical account than a real contribution to this subject. "automatic personalized newspaper layout" here doesn't include automatically finding a good page layout. The page layout is given by a fixed template, though the system supposedly can account for different text lengths or image sizes of article content, or display alternative content when some element is missing.

    It is historically interesting because it falls in the advent of the WWW. The browser Mosaic is explicitly mentioned as a device for electronically presenting news items, but in a time before HTML 2.0, apparently the possibilities for screen formatting were limited. TeX is also explicitly mentioned, in the sense of a somewhat competing product to the software fishpaper presented, which produces PostScript files from a given stream of news content and given page layout templates.

    Example from the paper:
    enter image description here

  2. Gonzalez J, Rojas I, Pomares H, Salmeron M, Merelo JJ.
    Web Newspaper Layout Optimization Using
    Simulated Annealing

    IEEE Trans Syst Man Cybern B Cybern. 2002;32(5):686-91.

    Thanks to Martin for the link.

    This is a classical research paper in the sense that the main focus lies on applying a specific optimization method (simulated annealing) to a precisely mathematically specified problem (web newspaper layout).

    The concrete results shown in the paper are not overwhelming, and in some sense the problem which is solved is not completely compatible with my own interest (this is for web pages, so no length restriction for the page produced; furthermore the design of a single article is rather uninspired), but from the results shown, it can be expected that the method could be extended towards solving the "complete" problem discussed here. Furthermore, the algorithm is tailored for "real time" application and takes only a couple of seconds for a realistic sample size.

    Example from the paper:
    enter image description here

The State of the Art?

Since I'm thinking about this subject, ads and blog posts about systems to make newspapers keep popping out to me 😉

Without a connection to one specific vendor, I'd just mention two examples which seem to represent the state of the art for systems which make newspaper layout easy:

A tool named "publishing cloud" seems to be a good representative of a large range of almost equivalent editing systems (easy to find with google) based on some easy-to-use web-based layout editor which is, however, template-based with a mostly manual page layouting process. The tools automate several stages of the publishing process, offering import filters for content (mostly to get content from web pages or newswire systems) and export to PDF or digital printing services, but not the part I'm interested in here, namely the process of arranging content on the document pages.

I would be interested in any hint that one of the systems in this area offers "real" automtic page layouting for a non-trivial newspaper layout, not just a really easy-to-use web frontend to do it manually.

Last but not least, I should mention that we have implemented a newsletter-generating system for a news agency which is completely automatically generating different types of newsletters every day and every week:

  1. EPD Wochenspiegel, a weekly news compilation which exists in a multitude of local and thematic variants.
  2. EPD Medien, another weekly newsletter with a specific theme (media) and a slightly different layout.
  3. EPD Zentralausgabe, a daily newsletter, again existing in multiple local variants.

On the linked pages, you can download example PDF files to take a look at the different layouts.

Here, everything is fully automatic: Only the compilation of articles has to be selected in the wire service application. But the layout is not what I would consider "Newspaper Layout", so these examples represent the state of the art we can currently produce, but do not answer my question.

Best Answer

Not much of an answer, more a couple of loose thoughts ...

Off-hand I'm not aware of any such system and also not aware of any research that deals with automatic newspaper layout. As far as I know there has been only very very limited attempts to approach the subject of automatic typesetting with more complex layout rules and dependencies that go beyond what is largely a linear process. You can count the with your hands:

  • Michael Plass (under Knuth)
  • Graham Asher in 1990 or so (Type & Set) - not sure what happened to that
  • Anne Brüggemann-Klein in the mid 90ties
  • Richard Furuta and a few others in the 90ties
  • Stephan Wohlfeil 1997 (Phd: On the Pagination of Complex Book-like Documents)

and to my knowledge nada otherwise. And those are all looking more at the questions arising from "book-like" documents rather than newspapers/journals. But I might be very wrong as I didn't follow that area closely in the last 10 years.

But assuming my knowledge is correct for a moment, it isn't really really surprising, is it? What you have is a global optimization problem of a constraint system where the possibilities that you need to test grow astronomically the moment you have more than a single column and a good number of floats with a certain set of constraints. And so far any serious attempts to do much better than choosing the trivial way out (no floats, just linear typesetting - aka MS-Word model) or a simple greedy algorithm that never looks back (like LaTeX does) got defeated by the complexity of the task.

Now newspaper typesetting on one hand comes with the additional complexity (but perhaps also the freedom) of having multiple input streams of limited length which allow for reordering (to some extent). On the other hand it will have much different requirements on picture order and call-outs.

By the way, to my knowledge it is quite common in newspaper writing that the authors have to write to length and if they don't they get edited to it. Are you thinking of taking that into account? Because if so that would simplify the task probably considerably.

So I think the first task would be to understand and research the constraint system, e.g., what kind of rules make newspapers or journals tick. Those will not be universal and most likely they are contradicting each other if taken all together. But they form a basis of what an algorithm needs to be able to be configured for. And only when those boundaries are known can one delve deeper into the question of designing such an algorithm. How close one can get to an ideal, I don't know. In some respects, I would assume that it might in fact be simpler for newspapers due to the flexibility of reordering stories but in any case I believe this is an open research topic that is so far unsolved (just like "the pagination of complex book-like documents" effectively is). --- I'm certainly interested and have been for more than two decades, even if I had to take a longer break after the millennium.

I don't know if Wohlfeil's PhD work is still easily available (it was difficult for me to get back then) but a quick search on the web brought up a shorter paper by Brüggeman-Klein/Klein/Wohlfeil "On the Pagination of Complex Documents" which is from around the same time. And I also found "Pagination reconsidered" by the same authors (but no date to go with it, but from the number it was probably earlier).

I'm sure that there are probably many other sources but one good book that I think is worth looking at for those who speak German is "Praxishandbuch Gestaltungsraster" by Andreas and Regina Maxhauer. Its focus isn't the newspaper angle, but rather the grid one but that naturally covers a good number of possible rules.

By the way, a good way to do some research (through far from perfect at the moment) is to look around in Microsoft's Academic Search. For example that gives you some more background on what Anne was doing over the years and which papers she co-authored. But you have to be aware that there is a lot of rubbish in the data they have and it is horribly incomplete in parts.

Update

Upon reading a bit in Stefan's PhD thesis again (which I incorrectly labeled habil initially) I came across the work of Krista Lagus who wrote in her master thesis about "Automated pagination of the generalized newspaper using simulated annealing". I didn't find the thesis on the web but perhaps it is worth exploring further.

Related Question