[Tex/LaTex] When is it good practice to use \unskip

spacingtex-core

One sees in source2e the command \unskip used in various places. My understanding in general terms is that the macro is the equivalent of \ignorespacesbehind.

Where would it be recommended to use \unskip?

MWE for experimentation:

\documentclass{article}
\usepackage{lipsum}
\begin{document}
\lipsum[1]
\def\test{\leavevmode \unskip test  }
\def\testa{\leavevmode test\ignorespaces  }

\test testing  \test 

\testa testing  \testa test
\end{document}

Best Answer

In addition to @Josephs's points, things to be aware of:

\unskip acts on horizontal and vertical space, so it will remove vertical space if used between paragraphs.
Unlike \ignorespaces which affects the conversion between input characters and tokens, \unskip works on the actual lists inside boxes, after all tokenisation and commands have been executed.
\unskip can not be used in outer vertical mode: Once an item has been added to the main vertical list of a page it can not be removed. So while in a minipage you can remove preceding vertical space with \unskip, on the main page you have to use \vskip-\lastskip to back up over the previous skip rather than actually removing it. This leaves breakable glue so you may also need to inject some \penalties to prevent page breaking.

Consider:

\documentclass{article}

\showoutput
\begin{document}

\setbox0\vbox{

\hbox{g}

\vskip 10pt

\hbox{b}
}

\showbox0


\setbox2\vbox{

\hbox{g}

\vskip 10pt

\unskip\nobreak\hbox{b}
}

\showbox2

\setbox4\vbox{

\hbox{g}

\vskip 10pt

\vskip-\lastskip\nobreak\hbox{b}
}

\showbox2


\hbox{g}

\vskip 10pt

\unskip\nobreak\hbox{b}


\stop

Box 0 is

\vbox(26.30554+0.0)x5.55557
.\hbox(4.30554+1.94444)x5.00002
..\OT1/cmr/m/n/10 g
.\glue 10.0
.\glue(\baselineskip) 3.11111
.\hbox(6.94444+0.0)x5.55557
..\OT1/cmr/m/n/10 b

But suppose (as in box 2) That the code adding b needs to remove space above, it could use \unskip which literally removes it, resulting in

\vbox(16.30554+0.0)x5.55557
.\hbox(4.30554+1.94444)x5.00002
..\OT1/cmr/m/n/10 g
.\penalty 10000
.\glue(\baselineskip) 3.11111
.\hbox(6.94444+0.0)x5.55557
..\OT1/cmr/m/n/10 b

If instead of removing it, negative space is added to compensate as n box 4 then you get

\vbox(16.30554+0.0)x5.55557
.\hbox(4.30554+1.94444)x5.00002
..\OT1/cmr/m/n/10 g
.\glue 10.0
.\glue -10.0
.\penalty 10000
.\glue(\baselineskip) 3.11111
.\hbox(6.94444+0.0)x5.55557
..\OT1/cmr/m/n/10 b

This looks the same but if the penalty was not 10000 then it would be a feasible breakpoint at that position, which means that if the list was unboxed the end of the first part would have depth 0 rather than the depth of g which can have subtle (or not so subtle) affects on positioning that are hard to correct (or at least hard to remember to correct).

So the issues surrounding \unskip are a lot simpler than the vskip-\lastskip combination, however if you are not in a box, you don't have a choice, as the last version on the main page produces:

! You can't use `\unskip' in vertical mode.
l.46 \unskip

Related Solutions

[Tex/LaTex] Good practice on spacing

You want to keep in mind all the irregularities in your text: places where superficially uniform text turns out not to be uniform. You give periods in abbreviations as examples; the reason they are is that a period does not usually go inside a sentence, but in this case, it does (and only a speaker of idiomatic English could know that, not TeX). The differential in an integral is another example: it is the same text as the integrand, but it is not part of the integrand (in this case, I could imagine TeX being written to look out for this, but there are probably good reasons it doesn't).

You should, of course, also look out for constructions which are superficially different but in fact are not. These may center on certain TeX idioms: for example, if you were programming in plain TeX (which you are not, and so you should not actually write this ever) you might do the following:

The following {\it italic text} is not well-spaced.

If you set that, you may see that the word text is a little too close to "is". In the TeXbook, Knuth reminds you to put in an "italic correction". However, now that LaTeX has \textit{...}, which takes care of this, you probably never even learned what an italic correction is. So this is a non-example. The point remains, however, that certain TeX constructions break the flow of the text (in particular, grouping) and you need to pay attention to the typeset result to see if they broke the spacing.

Vertical space can also be an issue, and harder to deal with. Knuth also warns against using tall symbols in the text (like \frac{1}{2} instead of 1/2) because they force the lines apart. Thus, you need to scrutinize all the inline math you write for tall symbols, and consider using displayed equations. Sometimes you can work around this using \smash if you know there is space and TeX doesn't.

Inline math causes another problem with TeX's line breaking algorithm, because it won't break at a lot of places in an equation, commas being the notorious example. Thus, write $a$, $b$, and $c$ rather than $a, b, \text{ and } c$ or even $a, b$, and $c$. Knuth also wants you to put a tie in: and~$c$; I confess that I never use ties. Like manual spacing corrections in equations, they seem like they should be reserved for final polishing (I mean, if Dr. House is in the middle of a line, it's not going to break).

In short, you need to watch for scope changes, mode changes, and changes in "semantic scope", where the last one is totally impossible to communicate to TeX and the other two are still insidious. However, you should not be afraid to "just try it" and see whether you really do have a problem. It is much faster to let TeX do whatever it does (and with TeX, "whatever it does" is sometimes all you can say easily) than to try to anticipate it.

[Tex/LaTex] Whatsits: when are they used in practice

If whatsits are aptly named is a matter of opinion, as I think they would fit better in TeX's semantics if they were called afterallnodes; whatsits represent commands whose execution is delayed or are special commands associated with a particular device or system and are not part of TeX's normal processing flow.

It is interesting to investigate Knuth's rationale for introducing them. In a meeting with NTG members on March 13th, 1996 Knuth in reply to a question said:

I tried to make the programs so that they would have logical structure and it would be easy to throw in new features. This hasn’t happened anywhere near as often as I thought because people were more interested, I think, in inter-changeability of what they do; once you have your own program, then other people don’t have it. Still, if I were a large publisher, and I were to get special projects— some encyclopaedia, some new edition of the Bible, things like that — I would certainly think that the right thing to do would be to hire a good programmer and make a special computer system just for this project. At least, that was my idea about the way people would do it. It seems that hasn’t happened very much, although in Brno I met a student who is well along on producing Acrobat format directly in TeX, by changing the code. And the Omega system that you mentioned, that’s 150,000 lines of change files [laughter].

I built in hooks so that every time TeX outputs a page, it could come to a whatsit node and a whatsit node could be something that was completely different in each version of TeX. So, when the program sees a whatsit node, it calls a special routine saying, ‘how do I typeset this whatsit node?’ It’ll look at the sub-type and the sub-type might be another sub-type put in as a demo or it might be a brand new sub-type.

A whatsit can appear in either a horizontal or a vertical list and has no dimensions. It signifies an operation that should be delayed as it doesn't fit in its ordinary scheme of things. The paragraph builder and the page builder scan lists submitted to them and execute certain types of whatsit. They are useful when associated with specific implementations.

The more common whatsits are the ones associated with the main vertical list:

(a) delayed writes generated by \write. The token list of a delayed \write is not written-out until the surrounding material of a \write makes it to the output routine where a \shipout is executed. Therefore, the write token list has to be stored on the main vertical list.

(b) specials that use the \special command. The token list of a special command is stored with the main vertical list because the token list needs to be written to the dvi file. This happens, as in the case of write at the time of shipout.

Practical implementations can be found in postcript, pdf, color drivers and graphics programs. An interesting read is always the hyperref manual. The package uses \specials extensively to implement the interface between TeX commands and the PDF page description language. They are very simple to write:

    \immediate\special{!pdfpagelabels #2}%

To summarize it is a free for all hook/interface. Why they were called whatsit -- my guess is that it was a Knuth (ala \fi) whatisit. This simple innocent special command enabled TeX to survive and adapt over the years, producing output from postcript to PDF and introducing color and graphics.

Best Answer

Related Solutions

[Tex/LaTex] Good practice on spacing

[Tex/LaTex] Whatsits: when are they used in practice

Related Question