[Tex/LaTex] Serbian Cyrillic using LuaTeX and XeTeX

cyrillickeyboardluatexxetex

This question is directly inspired by Martin Schroder's answer to this question. Namely, I am wondering how would one use LuaTeX or XeTeX to produce Serbian (little bit different than Russian) Cyrillic output using American keyboard layout? How would produce the same output using Serbian keyboard layout? The correct way to produce such output using pdfTeX engine and American keyboard layout is:

\documentclass{article}
\usepackage[OT2,T1]{fontenc}
\input{cyracc.def}
\newcommand\textcyr[1]{{\fontencoding{OT2}\fontfamily{wncyr}\selectfont #1}}
\begin{document}
Serbian alphabet again \dots \textcyr{\cyracc
A B V G D DJ E Zh Z I J K L LJ M N NJ O P R S T \'C U F Kh C Ch \Dzh\ Sh
} 
\end{document}

which gives

enter image here

One can use of course inputenc to use Serbian keyboard. On another hand Babel unfortunately requires Serbian keyboard so for me personally was not interesting.

Best Answer

Here is a method for XeLaTeX.

Prepare a file ascii-to-serbian.map with the following content:

; TECkit mapping for TeX input conventions <-> Unicode characters

LHSName "ASCII-to-Serbian"
RHSName "UNICODE"

pass(Unicode)

; ligatures from Knuth's original CMR fonts
U+002D U+002D           <>  U+2013  ; -- -> en dash
U+002D U+002D U+002D    <>  U+2014  ; --- -> em dash

U+0027          <>  U+2019  ; ' -> right single quote
U+0027 U+0027   <>  U+201D  ; '' -> right double quote
U+0022           >  U+201D  ; " -> right double quote

U+0060          <>  U+2018  ; ` -> left single quote
U+0060 U+0060   <>  U+201C  ; `` -> left double quote

U+0021 U+0060   <>  U+00A1  ; !` -> inverted exclam
U+003F U+0060   <>  U+00BF  ; ?` -> inverted question

; additions supported in T1 encoding
U+002C U+002C   <>  U+201E  ; ,, -> DOUBLE LOW-9 QUOTATION MARK
U+003C U+003C   <>  U+00AB  ; << -> LEFT POINTING GUILLEMET
U+003E U+003E   <>  U+00BB  ; >> -> RIGHT POINTING GUILLEMET

U+0041 <> U+0410 ; A
U+0042 <> U+0411 ; B
U+0043 <> U+0426 ; C
U+0043 U+0048 <> U+0427 ; CH
U+0043 U+0068 <> U+0427 ; Ch
U+0043 U+0031 <> U+040B ; C1
U+0027 U+0043 <> U+040B ; 'C
U+0044 <> U+0414 ; D
U+0044 U+004A <> U+0402 ; DJ
U+0044 U+006A <> U+0402 ; Dj
U+0044 U+005A U+0048 <> U+040F ; DZH
U+0044 U+007A U+0068 <> U+040F ; Dzh
U+0044 U+0031 <> U+040F ; D1
U+0045 <> U+0415 ; E
U+0046 <> U+0424 ; F
U+0047 <> U+0413 ; G
U+0048 <> U+0425 ; H
U+0049 <> U+0418 ; I
U+004A <> U+0408 ; J
U+004B <> U+041A ; K
U+004B U+0048 <> U+0425 ; KH
U+004B U+0068 <> U+0425 ; Kh
U+004C <> U+041B ; L
U+004C U+004A <> U+0409 ; LJ
U+004C U+006A <> U+0409 ; Lj
U+004D <> U+041C ; M
U+004E <> U+041D ; N
U+004E U+004A <> U+040A ; NJ
U+004E U+006A <> U+040A ; Nj
U+004F <> U+041E ; O
U+0050 <> U+041F ; P
;U+0051 <> ; Q
U+0052 <> U+0420 ; R
U+0053 <> U+0421 ; S
U+0053 U+0048 <> U+0428 ; SH
U+0053 U+0068 <> U+0428 ; Sh
U+0054 <> U+0422 ; T
U+0055 <> U+0423 ; U
U+0056 <> U+0412 ; V
;U+0057 <> ; W
U+0058 <> U+0425 ; X
;U+0059 ; Y
U+005A <> U+0417 ; Z
U+005A U+0048 <> U+0416 ; ZH
U+005A U+0068 <> U+0416 ; Zh

U+0061 <> U+0430 ; a
U+0062 <> U+0431 ; b
U+0063 <> U+0446 ; c
U+0063 U+0068 <> U+0447 ; ch
U+0063 U+0031 <> U+045B ; c1
U+0027 U+0063 <> U+045B ; 'c
U+0064 <> U+0434 ; d
U+0064 U+006A <> U+0452 ; dj
U+0064 U+007A U+0068 <> U+045F ; dzh
U+0064 U+0031 <> U+045F ; d1
U+0065 <> U+0435 ; e
U+0066 <> U+0444 ; f
U+0067 <> U+0433 ; g
U+0068 <> U+0445 ; h
U+0069 <> U+0438 ; i
U+006A <> U+0458 ; j
U+006B <> U+043A ; k
U+006B U+0068 <> U+0445 ; kh
U+006C <> U+043B ; l
U+006C U+006A <> U+0459 ; lj
U+006D <> U+043C ; m
U+006E <> U+043D ; n
U+006E U+006A <> U+045A ; nj
U+006F <> U+043E ; o
U+0070 <> U+043F ; p
;U+0071 <> ; q
U+0072 <> U+0440 ; r
U+0073 <> U+0441 ; s
U+0073 U+0068 <> U+0448 ; sh
U+0074 <> U+0442 ; t
U+0075 <> U+0443 ; u
U+0076 <> U+0432 ; v
;U+0077 <> ; w
U+0078 <> U+0445 ; x
;U+0079 ; y
U+007A <> U+0437 ; z
U+007A U+0068 <> U+0436 ; zh

; Additional (for official translitteration)
U+0110 <> U+0402 ; Đ
U+0111 <> U+0452 ; đ
U+017D <> U+0416 ; Ž
U+017E <> U+0436 ; ž
U+0106 <> U+040B ; Ć
U+0107 <> U+045B ; ć
U+010C <> U+0427 ; Č
U+010D <> U+0447 ; č
U+0044 U+017D <> U+040F ; DŽ
U+0044 U+017E <> U+040F ; Dž
U+0064 U+017E <> U+045F ; dž
U+0160 <> U+0428 ; Š
U+0161 <> U+0448 ; š

Then process it with

teckit_compile ascii-to-serbian.map

This will produce a file ascii-to-serbian.tec that you can put anywhere XeTeX will find it (in the working directory, for instance). Then make the following test file:

\documentclass{article}
\usepackage{fontspec}
\setmainfont[Ligatures=TeX]{Linux Libertine O}
\newfontfamily{\serbianfont}[Mapping=ascii-to-serbian]{Linux Libertine O}
\usepackage{polyglossia}
\setmainlanguage{english}
\setotherlanguage[Script=Cyrillic]{serbian}

\begin{document}
Serbian alphabet again

\begin{serbian}
A B V G D DJ E Zh Z I J K L LJ M N NJ O P R S T C1 U F Kh C Ch D1 Sh

a b v g d dj e zh z i j k l m n nj o p r s t c1 u f kh c ch d1 sh
\end{serbian} 
\end{document}

Sample output after xelatex test.tex

enter image description here

Note 1: the characters Џ and џ can be input also as DZH (or Dzh) and dzh. If this is incorrect (it might bring to incorrect ligatures) then remove the corresponding lines from ascii-to-serbian.map.

Note 2: if you find it inconvenient to type C1 and c1 to get Ћ and ћ, you can add the lines

U+0027 U+0043 <> U+040B ; 'C

and

U+0027 U+0063 <> U+040B ; 'c

after the C1 and c1 entries. This will allow you to input the characters as 'C and 'c.

If you want to input them as \'C and \'c, then insert this code after having loaded the Serbian language with Polyglossia

\let\standardcommandquote\'
\DeclareRobustCommand{\serbiancommandquote}[1]{%
  \ifnum\strcmp{#1}{c}=0 c1\else
    \ifnum\strcmp{#1}{C}=0 C1\else
      \standardcommandquote{#1}\fi\fi}
\makeatletter
\appto\blockextras@serbian{\let\'\serbiancommandquote}
\appto\inlineextras@serbian{\let\'\serbiancommandquote}
\appto\noextras@serbian{\let\'\standardcommandquote}
\makeatother

Note 3 (added Feb. 17): If one has available Unicode input, then also

Đ đ Ž ž Ć ć Č č DŽ Dž dž Š š

are mapped to

Ђ ђ Ж ж Ћ ћ Ч ч Џ џ Ш ш

respectively.

Related Solutions

[Tex/LaTex] Drawbacks of XeTeX/LuaTeX

Math isn't the problem if you are happy with the "normal" math fonts used already by pdftex. This will work fine with xelatex + lualatex too. You can also try unicode-math but I don't know if it works in all cases.

The multilanguage support is more problematic: As you are using different scripts (greek, russian) you can't use babel (at least for this languages), as it will break the unicode font support. So you need polyglossia and this doesn't work with lualatex yet as it use (at least for some languages) xetex specific commands like \XeTeXinterclass. Also the support files of some of the languages (e.g. french) are much more sophisticated in the babel version. It is possible to mix babel + polyglossia but it depends a lot on the actual language combination if and how good it works.

Regarding the microtype support: The newest version of xetex can do protrusion (I haven't tried it yet), lualatex can protrusion + expansion. The author of microtype has just announced on c.t.t. that a preliminary version of microtype exists which supports both engines.
But at least for lualatex it isn't needed, one can activate both without problems manually:

 \documentclass[fontsize=12pt]{scrartcl}
  \pdfprotrudechars1
  \pdfadjustspacing1

 \usepackage{lipsum}
 \usepackage{fontspec}
 \newfontfeature{Microtype}{protrusion=default;expansion=default;}
 \setmainfont[Microtype,Ligatures=TeX]{Linux Libertine O}
 \begin{document}
 \lipsum
 \end{document}

[Tex/LaTex] LuaTeX cyrillic hyphenation problems

I just realized that it could be that the non-hyphenation is because of character codes. The latest luatex's from texlive have a file luatex-unicode-letters.tex that sets the lowercase codes for non-latin letters, and chances are that that is not in miktex (yet). I put a copy of that file up under the link above, you could try to \input that file, perhaps it fixes things.

Best Answer

Related Solutions

[Tex/LaTex] Drawbacks of XeTeX/LuaTeX

[Tex/LaTex] LuaTeX cyrillic hyphenation problems

Related Question