[Tex/LaTex] LTR sequences within RTL text – alternative to cumbersome markup

right-to-leftxepersianxetex

I am writing a book in Persian but text in Latin scripts is scattered all over the book.

I have to wrap every Latin script with \lt{} to guide xetex to align words from left to right. It's really cumbersome.

If I do not use \lr{} output of One Two Three will be Three Two One in Persian documents and the output of یک دو سه will be سه دو یک‍‍ in English documents.

This is a very basic requirement and I wonder why xetex can not do it without extra markup.
Is there any way for not using \lr{}

Best Answer

To start it, this is not as basic as you claim it to be. You may be able to do it with using \XeTeXinterchartoks primitive of XeTeX. Aan example:

The following was a response of Jonathan Kew (the author of XeTeX) to me a while ago, I just modified his example to work with XePersian:

\documentclass{article}
\usepackage{xepersian}
\makeatletter
% classes 1-3 are used in unicode-letters.tex, so we'll put the Latin  letters in 4
\newcount\xp@n
\xp@n=`\A \loop \XeTeXcharclass \xp@n=4 \ifnum\xp@n<`\Z \advance\xp@n by 1 \repeat
\xp@n=`\a \loop \XeTeXcharclass \xp@n=4 \ifnum\xp@n<`\z \advance\xp@n by 1 \repeat
% when we encounter class 4, we'll do \startlatin
\XeTeXinterchartoks 0 4 {\startlatin}
\XeTeXinterchartoks 255 4 {\startlatin}
% and when we encounter class 0, we'll do \finishlatin
\XeTeXinterchartoks 255 0 {\finishlatin}
\XeTeXinterchartoks 4 0 {\finishlatin}
\newcommand{\startlatin}{\if@Latin\else\bgroup\beginL\latinfont\@Latintrue\fi}
\newcommand{\finishlatin}{\if@Latin\unskip\endL\egroup{ }\fi}
\makeatother
\XeTeXinterchartokenstate=1
\begin{document}
این یک آزمایش است
One Two Three
و ادامه آن
\end{document}

Note that it both changes font (to latin font) and direction (to LTR).

However, I suspect you're not really going to be able to do this on a large scale, because it will be too difficult to handle things like
punctuation and spacing at direction changes. In unidirectional text, it may not matter whether the "language switch" happens before or
after the space (or punctuation mark), but with bidi it does matter. I think in the end you're still going to need markup if you want to
reliably mix LR and RL scripts.

In Addition, LR and RL scripts share some characters. So for example, how would you be able to decide if ) or ( is a RL chracter or an LR one?

Alternatively, you may be able to implement a preprocessor (written in C or any other language) that converts say, test.tex to test1.tex and places all LR words inside \lr. Actually BiDiTeX exists so you may be able to get its sources and modify it a bit to work with bidi/XePersian packages.

Related Question