[Tex/LaTex] Separation of content structure and style in LaTeX

contextlatex-misc

In HTML, over the last decade or so, there has been a strong push toward complete separation of content structure and style. Most websites are now built using HTML for structural mark-up, and CSS for presentation of that markup. This makes it really easy to apply different styles over the same content: If you aren't aware of how powerful this is, see for instance http://www.csszengarden.com/. There is a W3c document making the case for separation of semantic and presentational markup.

I'm relatively new to LaTeX, but I have been designing websites for a while. Yes, they're different fields, but they're trying to do the same thing: present content well. My experience with LaTeX over the last 6 months leaves me feeling that this concept of separation of content and style hasn't worked it's way into the TeX world very far. For instance, to define the wrapping rules of a table cell in HTML+CSS is as simple as adding a class to the cell, and adding one line to your CSS document. In LaTeX, you need to do something horrible like this.

So, am I missing something, or is LaTeX? Is this concept of separation of content and style used in the design of LaTeX? Is it just poorly implemented? Is it likely to be implemented better in future versions (LaTeX3? ConTeXt?)?

Note: I mean no offence to LaTeX devs: the system is really nice for many other reasons. I just see this gaping hole, and little discussion around it, and I am wondering why.

Best Answer

History

Knuth wrote TeX in the late 1970s because he wanted to typeset material as well as he could, given the limitations of his own knowledge and of the technology available at the time. It's generally agreed he did a pretty good job, but what he certainly was not trying to do was separate structure and style.

Lamport wrote LaTeX in the mid 1980s when he saw the need for a clearer separation of the two areas. LaTeX was revised in the early 1990s, and the current kernel dates from 1994 (with bug fixes, of course). This predates the HTML + CSS model by some time, and again technological limitations meant that further complication of LaTeX then would have been impossible. (In 1994, LaTeX was almost too large for many PCs, and the team worked very hard to squeeze it down.)

In the HTML world, new tags can be added and will be ignored by renderers which do not know them. That's not the case for TeX: unknown control sequences are errors. So we can't just add new concepts and expect existing documents to work: this is really important. So the decisions made in 1994 still have importance for LaTeX today.

ConTeXt is newer, and does separate out a lot more design than LaTeX 'out of the box'. ConTeXt also takes a different approach to stability than LaTeX, with a more active development outlook for the kernel. However, the ConTeXt approach is in some ways more like plain TeX than LaTeX, in the sense that ConTeXt keeps the design 'closer to the user' than LaTeX does.

Input and output

In the HTML world, a document is read entirely into memory to build the DOM for rendering. TeX does not work like that, at least unless we program it all ourselves. Instead, TeX reads a line and processes it before moving on to the next line. (LuaTeX can alter this, but I think even in ConTeXt it's still true that the TeX model is the main one.) As such, the approaches needed to alter appearance are very different.

A key thing to bear in mind when thinking about this area is what people want as output. In the TeX world, we are focused on high quality typesetting. As such, there will almost always be some manual adjustment of the design to reflect the realities of the content. That's not what happens in 'well written' HTML, and although it can be expressed in XML, it certainly breaks the strict separation. I and others would argue that this is no bad thing: you do need manual intervention to get the best results.

Tables

Tables are specifically mentioned in the question, and I think they deserve consideration on their own. In HTML, tables have been used for a variety of purposes. In TeX, there is a much more restricted approach to tables. Tables are famously complex beasts in the TeX world, and Knuth did point out that's it's amazing that they work at all! In most typeset documents, tables are used mainly for 'formal tables', and these have a pretty restricted range of 'good' appearances. As such, there is less need to provide the full range of CSS-like controls.

As canaaerus says in his answer, the TeX world is managed not by a committee but by no-one, and so what gets implemented depends on what individual users want. There are a range of table packages out there for LaTeX, plus the ConTeXt approach, and the raw \halign in plain TeX. However, they are mainly trying to solve other problems, which tells you where the priority for users is.

Looking ahead

As a member of the LaTeX3 Project, I know that we certainly are discussing better separation of content and design. One issue that is worth bearing in mind here is that the HTML + CSS model does not always translate well into what we want for typesetting. There are some significant differences between the two areas, and that means it's never going to be as simple.

Any better approach has to work with TeX, both in code terms and for interface. We have experimental code to deal with the relationship between objects ('l3ldb'), plus the idea of 'templates' for design, both of which are in this area.

Related Question