[Tex/LaTex] Getting right-to-left output in Arabic and Persian/Farsi with pdfLaTeX

arabicenglishpdftexpersianright-to-left

I have an English document in which I need to inject a few example words from many different languages, including Arabic and Persian.

I've sorta gotten it to work with the babel package and the \foreignlanguage{arabic}{الأحد} command, but the characters come out garbled, presumably because of the right-to-left (RTL) thing. If I manually reverse all the characters (\foreignlanguage{arabic}{دحألا}), they apparently do not join together the way they are supposed to… again, because of RTL.

The template/style I am forced to use compiles with pdflatex but NOT xelatex. Attempting to use the arabtex package or bidi packages breaks the template with a firehose of mind-exploding errors.

Any suggestions?

PS: copy-and-pasting the literal UTF-8 encoded tex snippet from my text editor seems to correct itself to RTL in this stackexchange editor, so I'm not sure I can give you the full picture of the problem I'm dealing with… 🙁

EDIT: here's a MWE…

\documentclass[10pt]{article}
\usepackage[usenames]{color} %used for font color
\usepackage{amssymb} %maths
\usepackage{amsmath} %maths
\usepackage{booktabs}
\usepackage[utf8]{inputenc}
\usepackage[arabic,farsi,bulgarian,greek,magyar,frenchb,german,english]{babel}
\usepackage{CJKutf8}

\begin{document}

\begin{tabular}{p{1.8cm}ccccccc}
\toprule
Language & $\rho$ & 1 & 2 & 3 & 4 & 5 & 6 \\
\midrule
German & 0.568 & weißt & überrascht & teppich & schwäche & kompetent & verbündet \\
Hungarian & 0.506 & tegyünk & recepciós & leírás & oktat & visszaveti & rengette \\
French & 0.500 & envoyer & vélo & randonnée & blessure & mixte & matérialisme \\
Bulgarian & 0.505 & \foreignlanguage{bulgarian}{време} & \foreignlanguage{bulgarian}{болка} & \foreignlanguage{bulgarian}{самотен} & \foreignlanguage{bulgarian}{съдружие} & \foreignlanguage{bulgarian}{надделеят} & \foreignlanguage{bulgarian}{уязвимите} \\
Greek & 0.491 & \foreignlanguage{greek}{πόρτα} & \foreignlanguage{greek}{πατινάζ} & \foreignlanguage{greek}{εξοχή} & \foreignlanguage{greek}{επεξεργάζομαι} & \foreignlanguage{greek}{ορίζοντας} & \foreignlanguage{greek}{εδαφικός} \\
Arabic & 0.512 & \foreignlanguage{arabic}{الأحد} & \foreignlanguage{arabic}{كحض} & \foreignlanguage{arabic}{ةرافسلا} & \foreignlanguage{arabic}{ةظتكملا} & \foreignlanguage{arabic}{يثراك} & \foreignlanguage{arabic}{ددب} \\
Korean & 0.495 & \begin{CJK}{UTF8}{mj}비가\end{CJK} & \begin{CJK}{UTF8}{mj}기억\end{CJK} & \begin{CJK}{UTF8}{mj}무서운\end{CJK} & \begin{CJK}{UTF8}{mj}따라서\end{CJK} & \begin{CJK}{UTF8}{mj}왜곡\end{CJK} & \begin{CJK}{UTF8}{mj}지배하는\end{CJK} \\
Chinese & 0.482 & \begin{CJK}{UTF8}{gbsn}星期三\end{CJK} & \begin{CJK}{UTF8}{gbsn}司机\end{CJK} & \begin{CJK}{UTF8}{gbsn}要求\end{CJK} & \begin{CJK}{UTF8}{gbsn}动态\end{CJK} & \begin{CJK}{UTF8}{gbsn}翻新\end{CJK} & \begin{CJK}{UTF8}{gbsn}锲而不舍\end{CJK} \\
Persian & 0.433 & \foreignlanguage{farsi}{روزنامه} & \foreignlanguage{farsi}{فروشگاه} & \foreignlanguage{farsi}{درد} & \foreignlanguage{farsi}{فکری} & \foreignlanguage{farsi}{تقویت} & \foreignlanguage{farsi}{نزدیکی} \\
Japanese & 0.326 & \begin{CJK}{UTF8}{min}月\end{CJK} & \begin{CJK}{UTF8}{min}スキー\end{CJK} & \begin{CJK}{UTF8}{min}祭り\end{CJK} & \begin{CJK}{UTF8}{min}正直\end{CJK} & \begin{CJK}{UTF8}{min}地質\end{CJK} & \begin{CJK}{UTF8}{min}撤退\end{CJK} \\
\bottomrule
\end{tabular}

\end{document}

The Arabic and Persian (Farsi) words render incorrectly for me.

UPDATE: Here is what the output looks like for me. As you can see, the Arabic and Persian (Farsi) are reversed.

Best Answer

Short answer: Instead of \foreignlanguage{arabic} and \foreignlanguage{farsi}, use \AR and \FR.


Firstly, the MWE given in the question (at least as of the current revision) is most certainly not Minimal. Here is something shorter:

\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage[arabic,farsi,english]{babel}

\begin{document}
Arabic \foreignlanguage{arabic}{كحض}

Persian \foreignlanguage{farsi}{فروشگاه}
\end{document}

which produces

output from MWE

where the Arabic and Persian texts are not typeset right-to-left as they should be.

Why this happens is easy to explain: the Unicode representation of the Arabic text كحض consists of

and these three code points are supposed to be placed right-to-left (with additional rules like those for ligatures), giving كحض. Instead, when these characters are naively placed in the order they occur in the input (something like: ك x ح x ض where I used x to separate the characters), you see the kind of incorrect output you see above. (Similarly for Persian.) So what's missing are the instructions to TeX placing the characters in the right order.

This appears to be a bug in the babel package's support for these languages. Some comments on related questions (1, 2) refer to a \textRL command: loading the babel package with \usepackage[arabic,farsi,english]{babel} as above indeed defines a \textRL command, but this has a bug: \show\textRL shows that it expands to \expandafter \@farsi@R {#1} so the second language selected overrides the first.

A closer looks at the logs reveals that this \textRL command comes from arabi loaded by babel, whose documentation mentions this problem, and says that \textRL is deprecated. What it instead recommends are \AR and \FR for Arabic and Farsi respectively. So we can use those in our MWE:

\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage[arabic,farsi,english]{babel}

\begin{document}
Arabic \AR{كحض}

Persian \FR{فروشگاه}
\end{document}

which correctly produces:

fixed MWE output

For the non-MWE in the question, we can just blindly replace \foreignlanguage{arabic} and \foreignlanguage{farsi} with \AR and \FR respectively, to get this output:

output for non-MWE

Related Question