[Tex/LaTex] How to automatically adjust the emdash length according to current language

babelfontspecligaturespolyglossia

In Russian typography the emdash, which is typed as the ligature --- in LaTeX, is 20% shorter than the standard emdash. The babel package cares this difference so that --- prints a shorter emdash if russian is the current language. However this switch mechanism works only for latex.exe engine whereas lualatex.exe and xelatex.exe type long emdash in any case.

This is because under LaTeX \selectlanguage{russian} switches current encoding from, say, OT1 or T1 to, say, T2A. As a result latin emdash comes from, say, cmr font family, whereas russian emdash comes from LH fonts (as a rule). Under LuaTeX or XeLaTeX \selectlanguage does not switch current encoding (it remains EU2 for LuaTeX or EU1 for XeTeX) so that --- always came from same font.

Note however that most fonts contain dashes of different length and, thus, it is possible (in principle) to map --- to different code points depending on the current script.

My question is: how to do that using instruments provided by the fontspec package?

Note that babel provides a shorthand "--- (if the russian option is indicated) which always types 20% shorter emdash but it also reduces spaces around the emdash and prevents the line break after it.

UPDATE:
I have realised that there is no code point in modern Open Type fonts (in contrast to metafont LH fonts used in legacy LaTeX for typesetting cyrillic texts). Both polyglossia and babel with russian option compose a shorter emdash from two endashes, they define a \cyrdash macros as follows

\def\cyrdash{\hbox to.8em{--\hss--}}

and map it to a shorthand "---. So final form of my question is

How to map the ligature --- to \cyrdash? Is there any solution except for making the dash - active character?

Best Answer

As nplatis pointed out, this is actually in the responsibility of polyglossia, which works for me, but is more or less font-dependent, as polyglossia just uses two “-” U+002D HYPHEN-MINUS and overlaps them, as can be inferred from the PT Sans example, where it does not work:

Russian em-dash example (XeLaTeX)

The key to getting this to work is to add the babelshorthands=true to the language selection, so the code to create the above image is as follows:

%!TEX TS-program = xelatex
%!TEX encoding = UTF-8 Unicode
\documentclass{article}

\usepackage{polyglossia}
\setmainfont{PT Sans}
\setsansfont{PT Sans Caption}
\defaultfontfeatures{Scale=MatchLowercase, Mapping=tex-text}
\setdefaultlanguage[spelling=modern,babelshorthands=true]{russian}
\setotherlanguage{english}

\newfontfamily{\HT}[Scale=MatchLowercase]{Hoefler Text}
\newfontfamily{\GPP}{Garamond Premier Pro}
\newfontfamily{\BV}{Baskerville}
\newfontfamily{\CCY}{Charcoal CY}

\newcommand{\text}{Слово "--- слово.\\Слово — слово.\par}
%1. babelshorthand/2. U+2014 EM DASH

\begin{document}
PT Sans\\
\text
{\HT Hoefler Text\\
\text}
{\GPP Garamond Premier Pro\\
\text}
{\BV Baskerville\\
\text}
{\CCY Charcoal CY\\
\text}

\end{document}

For further information, see also this thread in the XeTeX mailing list. Also, babel apparently distinguishes several em-dashes for Russian:
"--- Cyrillic emdash in plain text.
"--~ Cyrillic emdash in compound names (surnames).
"--* Cyrillic emdash for denoting direct speech.
See here for more information.


On a side note, I have not succeeded in getting the shorter em-dash in regular LaTeX, just the narrower spacing, strangely enough:

Russian em-dash example (LaTeX)

Code:

\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage[russian]{babel}

\begin{document}
\selectlanguage{russian}
\noindent
Слово "--- слово.\\
Слово --- слово.\\
Слово — слово.\par% U+2014 EM DASH
\end{document}
Related Question