You want to keep in mind all the irregularities in your text: places where superficially uniform text turns out not to be uniform. You give periods in abbreviations as examples; the reason they are is that a period does not usually go inside a sentence, but in this case, it does (and only a speaker of idiomatic English could know that, not TeX). The differential in an integral is another example: it is the same text as the integrand, but it is not part of the integrand (in this case, I could imagine TeX being written to look out for this, but there are probably good reasons it doesn't).
You should, of course, also look out for constructions which are superficially different but in fact are not. These may center on certain TeX idioms: for example, if you were programming in plain TeX (which you are not, and so you should not actually write this ever) you might do the following:
The following {\it italic text} is not well-spaced.
If you set that, you may see that the word text is a little too close to "is". In the TeXbook, Knuth reminds you to put in an "italic correction". However, now that LaTeX has \textit{...}
, which takes care of this, you probably never even learned what an italic correction is. So this is a non-example. The point remains, however, that certain TeX constructions break the flow of the text (in particular, grouping) and you need to pay attention to the typeset result to see if they broke the spacing.
Vertical space can also be an issue, and harder to deal with. Knuth also warns against using tall symbols in the text (like \frac{1}{2}
instead of 1/2
) because they force the lines apart. Thus, you need to scrutinize all the inline math you write for tall symbols, and consider using displayed equations. Sometimes you can work around this using \smash
if you know there is space and TeX doesn't.
Inline math causes another problem with TeX's line breaking algorithm, because it won't break at a lot of places in an equation, commas being the notorious example. Thus, write $a$, $b$, and $c$
rather than $a, b, \text{ and } c$
or even $a, b$, and $c$
. Knuth also wants you to put a tie in: and~$c$
; I confess that I never use ties. Like manual spacing corrections in equations, they seem like they should be reserved for final polishing (I mean, if Dr. House
is in the middle of a line, it's not going to break).
In short, you need to watch for scope changes, mode changes, and changes in "semantic scope", where the last one is totally impossible to communicate to TeX and the other two are still insidious. However, you should not be afraid to "just try it" and see whether you really do have a problem. It is much faster to let TeX do whatever it does (and with TeX, "whatever it does" is sometimes all you can say easily) than to try to anticipate it.
The only way to accomplish the task is to make -
an active character and define it in such a way that it expands to a minus sign in math mode while, in text mode it looks forward to see whether one or two hyphens follow it and act in consequence.
A possible implementation with the active hyphen is as follows
\makeatletter
\def\ah@hyphen{-}
\def\ah@endash{--}
\def\ah@emdash{---}
\catcode`\-=\active
\protected\def-{\ifmmode\ah@hyphen\else\expandafter\ah@check\fi}
\def\ah@check{\@ifnextchar-{\ah@checki}{\ah@hyphen}}
\def\ah@checki#1{\@ifnextchar-{\ah@three}{\ah@two}}
\def\ah@two{\unskip~\ah@endash\space\ignorespaces}
\def\ah@three#1{\unskip~\ah@emdash\space\ignorespaces}
\makeatother
There is, however, a way out using Unicode characters. If your document is written in UTF-8 you can say
\usepackage{newunicodechar}
\newunicodechar{–}{\unskip~--\space\ignorespaces}
\newunicodechar{—}{\unskip~---\space\ignorespaces}
where in line 2 –
is U+2013 EN DASH and in line 3 —
is U+2014 EM DASH; using these characters in your source will do what you want. The main problem here is that they are almost indistinguishable from each other in a monospaced font. Just to show them I'll put them in a code box:
– U+2013 EN DASH
— U+2014 EM DASH
and here's how they appear in a quotation box:
– U+2013 EN DASH
— U+2014 EM DASH
The rendering on screen depends on the font, of course.
Best Answer
"In most text typefaces, em dashes have no side bearings, which make them appear very close to the words they separate" (James Felici)
The main problem is that, due to the stems in some characters, the dash looks not so close as in other chars. For example:
The dash looks much more separated from the 1 than from the 6. But if you draw the boxes around each char, you can see that it touches both boxes:
You can see now that the problem is in the "1", which has too much white space at its right. This is intentional, so that all digits have the same width. But in another font it could be different.
The problem depends on the font and on each character in the font. Thus, it is a kerning issue. Each font should define an appropiate kerning between each char and the dashes, so that this kind of effects is not noticeable. Unfortunately kerning information is stored in the font, and you cannot (easily) modify it from TeX (see here).
By the way, just in case someone cares, the code used to produce the above figures is the following: