Line Breaking – Preventing Break Before En Dash and Em Dash

best practicesline-breakingpunctuation

In Polish typography dash (pol. myślnik) should not be put after a line break.
Below you can find badly and correctly typed samples using en dash (pol. półpauza) and em dash (pol. pauza).

\documentclass[12pt]{article}
\usepackage[paperwidth=95mm,paperheight=55mm,margin=5mm,right=24mm,marginparsep=5mm]{geometry}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{lmodern}
\usepackage{microtype}
\usepackage{xcolor}
\pagestyle{empty}
\begin{document}
% line break before a dash is a sin according to Polish typography rules
\leavevmode\marginpar{\textsc{\color{purple}źle\\(bad)}}%
To jest maciupeńki test półpauzy -- na Zachodzie nazywanej \emph{en dash}.  {\color{orange}\hfill~--}
\par \emph{Em dash} za to nazywamy pauzą --- obecnie dość rzadko spotykana. {\color{orange}\hfill~---}
\vfill
% line break after a dash -- this is the way it should be done
\leavevmode\marginpar{\textsc{\color{teal}dobrze\\(good)}}%
To jest maciupeńki test półpauzy~-- na Zachodzie nazywanej \emph{en dash}.  {\color{orange}\hfill~--}
\par \emph{Em dash} za to nazywamy pauzą~--- obecnie dość rzadko spotykana. {\color{orange}\hfill~---}
\end{document}

enter image description here

To obtain correct result I had to use non-breaking space (tie) before each dash.

Is it possible to fix behavior of all en/em dashes surrounded by normal spaces in LaTeX document?

Side note: I am not asking about workarounds requiring preprocessing, like using s/ -- /~-- / in Vim/sed/perl/etc.

Best Answer

The only way to accomplish the task is to make - an active character and define it in such a way that it expands to a minus sign in math mode while, in text mode it looks forward to see whether one or two hyphens follow it and act in consequence.

A possible implementation with the active hyphen is as follows

\makeatletter
\def\ah@hyphen{-}
\def\ah@endash{--}
\def\ah@emdash{---}
\catcode`\-=\active
\protected\def-{\ifmmode\ah@hyphen\else\expandafter\ah@check\fi}
\def\ah@check{\@ifnextchar-{\ah@checki}{\ah@hyphen}}
\def\ah@checki#1{\@ifnextchar-{\ah@three}{\ah@two}}

\def\ah@two{\unskip~\ah@endash\space\ignorespaces}
\def\ah@three#1{\unskip~\ah@emdash\space\ignorespaces}
\makeatother

There is, however, a way out using Unicode characters. If your document is written in UTF-8 you can say

\usepackage{newunicodechar}
\newunicodechar{–}{\unskip~--\space\ignorespaces}
\newunicodechar{—}{\unskip~---\space\ignorespaces}

where in line 2 is U+2013 EN DASH and in line 3 is U+2014 EM DASH; using these characters in your source will do what you want. The main problem here is that they are almost indistinguishable from each other in a monospaced font. Just to show them I'll put them in a code box:

– U+2013 EN DASH  
— U+2014 EM DASH

and here's how they appear in a quotation box:

– U+2013 EN DASH
— U+2014 EM DASH

The rendering on screen depends on the font, of course.