[Tex/LaTex] Serbian Cyrillic using LuaTeX and XeTeX

cyrillickeyboardluatexxetex

This question is directly inspired by Martin Schroder's answer to this question. Namely, I am wondering how would one use LuaTeX or XeTeX to produce Serbian (little bit different than Russian) Cyrillic output using American keyboard layout? How would produce the same output using Serbian keyboard layout? The correct way to produce such output using pdfTeX engine and American keyboard layout is:

\documentclass{article}
\usepackage[OT2,T1]{fontenc}
\input{cyracc.def}
\newcommand\textcyr[1]{{\fontencoding{OT2}\fontfamily{wncyr}\selectfont #1}}
\begin{document}
Serbian alphabet again \dots \textcyr{\cyracc
A B V G D DJ E Zh Z I J K L LJ M N NJ O P R S T \'C U F Kh C Ch \Dzh\ Sh
} 
\end{document}

which gives

enter image here

One can use of course inputenc to use Serbian keyboard. On another hand Babel unfortunately requires Serbian keyboard so for me personally was not interesting.

Best Answer

Here is a method for XeLaTeX.

Prepare a file ascii-to-serbian.map with the following content:

; TECkit mapping for TeX input conventions <-> Unicode characters

LHSName "ASCII-to-Serbian"
RHSName "UNICODE"

pass(Unicode)

; ligatures from Knuth's original CMR fonts
U+002D U+002D           <>  U+2013  ; -- -> en dash
U+002D U+002D U+002D    <>  U+2014  ; --- -> em dash

U+0027          <>  U+2019  ; ' -> right single quote
U+0027 U+0027   <>  U+201D  ; '' -> right double quote
U+0022           >  U+201D  ; " -> right double quote

U+0060          <>  U+2018  ; ` -> left single quote
U+0060 U+0060   <>  U+201C  ; `` -> left double quote

U+0021 U+0060   <>  U+00A1  ; !` -> inverted exclam
U+003F U+0060   <>  U+00BF  ; ?` -> inverted question

; additions supported in T1 encoding
U+002C U+002C   <>  U+201E  ; ,, -> DOUBLE LOW-9 QUOTATION MARK
U+003C U+003C   <>  U+00AB  ; << -> LEFT POINTING GUILLEMET
U+003E U+003E   <>  U+00BB  ; >> -> RIGHT POINTING GUILLEMET

U+0041 <> U+0410 ; A
U+0042 <> U+0411 ; B
U+0043 <> U+0426 ; C
U+0043 U+0048 <> U+0427 ; CH
U+0043 U+0068 <> U+0427 ; Ch
U+0043 U+0031 <> U+040B ; C1
U+0027 U+0043 <> U+040B ; 'C
U+0044 <> U+0414 ; D
U+0044 U+004A <> U+0402 ; DJ
U+0044 U+006A <> U+0402 ; Dj
U+0044 U+005A U+0048 <> U+040F ; DZH
U+0044 U+007A U+0068 <> U+040F ; Dzh
U+0044 U+0031 <> U+040F ; D1
U+0045 <> U+0415 ; E
U+0046 <> U+0424 ; F
U+0047 <> U+0413 ; G
U+0048 <> U+0425 ; H
U+0049 <> U+0418 ; I
U+004A <> U+0408 ; J
U+004B <> U+041A ; K
U+004B U+0048 <> U+0425 ; KH
U+004B U+0068 <> U+0425 ; Kh
U+004C <> U+041B ; L
U+004C U+004A <> U+0409 ; LJ
U+004C U+006A <> U+0409 ; Lj
U+004D <> U+041C ; M
U+004E <> U+041D ; N
U+004E U+004A <> U+040A ; NJ
U+004E U+006A <> U+040A ; Nj
U+004F <> U+041E ; O
U+0050 <> U+041F ; P
;U+0051 <> ; Q
U+0052 <> U+0420 ; R
U+0053 <> U+0421 ; S
U+0053 U+0048 <> U+0428 ; SH
U+0053 U+0068 <> U+0428 ; Sh
U+0054 <> U+0422 ; T
U+0055 <> U+0423 ; U
U+0056 <> U+0412 ; V
;U+0057 <> ; W
U+0058 <> U+0425 ; X
;U+0059 ; Y
U+005A <> U+0417 ; Z
U+005A U+0048 <> U+0416 ; ZH
U+005A U+0068 <> U+0416 ; Zh

U+0061 <> U+0430 ; a
U+0062 <> U+0431 ; b
U+0063 <> U+0446 ; c
U+0063 U+0068 <> U+0447 ; ch
U+0063 U+0031 <> U+045B ; c1
U+0027 U+0063 <> U+045B ; 'c
U+0064 <> U+0434 ; d
U+0064 U+006A <> U+0452 ; dj
U+0064 U+007A U+0068 <> U+045F ; dzh
U+0064 U+0031 <> U+045F ; d1
U+0065 <> U+0435 ; e
U+0066 <> U+0444 ; f
U+0067 <> U+0433 ; g
U+0068 <> U+0445 ; h
U+0069 <> U+0438 ; i
U+006A <> U+0458 ; j
U+006B <> U+043A ; k
U+006B U+0068 <> U+0445 ; kh
U+006C <> U+043B ; l
U+006C U+006A <> U+0459 ; lj
U+006D <> U+043C ; m
U+006E <> U+043D ; n
U+006E U+006A <> U+045A ; nj
U+006F <> U+043E ; o
U+0070 <> U+043F ; p
;U+0071 <> ; q
U+0072 <> U+0440 ; r
U+0073 <> U+0441 ; s
U+0073 U+0068 <> U+0448 ; sh
U+0074 <> U+0442 ; t
U+0075 <> U+0443 ; u
U+0076 <> U+0432 ; v
;U+0077 <> ; w
U+0078 <> U+0445 ; x
;U+0079 ; y
U+007A <> U+0437 ; z
U+007A U+0068 <> U+0436 ; zh

; Additional (for official translitteration)
U+0110 <> U+0402 ; Đ
U+0111 <> U+0452 ; đ
U+017D <> U+0416 ; Ž
U+017E <> U+0436 ; ž
U+0106 <> U+040B ; Ć
U+0107 <> U+045B ; ć
U+010C <> U+0427 ; Č
U+010D <> U+0447 ; č
U+0044 U+017D <> U+040F ; DŽ
U+0044 U+017E <> U+040F ; Dž
U+0064 U+017E <> U+045F ; dž
U+0160 <> U+0428 ; Š
U+0161 <> U+0448 ; š

Then process it with

teckit_compile ascii-to-serbian.map

This will produce a file ascii-to-serbian.tec that you can put anywhere XeTeX will find it (in the working directory, for instance). Then make the following test file:

\documentclass{article}
\usepackage{fontspec}
\setmainfont[Ligatures=TeX]{Linux Libertine O}
\newfontfamily{\serbianfont}[Mapping=ascii-to-serbian]{Linux Libertine O}
\usepackage{polyglossia}
\setmainlanguage{english}
\setotherlanguage[Script=Cyrillic]{serbian}

\begin{document}
Serbian alphabet again

\begin{serbian}
A B V G D DJ E Zh Z I J K L LJ M N NJ O P R S T C1 U F Kh C Ch D1 Sh

a b v g d dj e zh z i j k l m n nj o p r s t c1 u f kh c ch d1 sh
\end{serbian} 
\end{document}

Sample output after xelatex test.tex

enter image description here

Note 1: the characters Џ and џ can be input also as DZH (or Dzh) and dzh. If this is incorrect (it might bring to incorrect ligatures) then remove the corresponding lines from ascii-to-serbian.map.

Note 2: if you find it inconvenient to type C1 and c1 to get Ћ and ћ, you can add the lines

U+0027 U+0043 <> U+040B ; 'C

and

U+0027 U+0063 <> U+040B ; 'c

after the C1 and c1 entries. This will allow you to input the characters as 'C and 'c.

If you want to input them as \'C and \'c, then insert this code after having loaded the Serbian language with Polyglossia

\let\standardcommandquote\'
\DeclareRobustCommand{\serbiancommandquote}[1]{%
  \ifnum\strcmp{#1}{c}=0 c1\else
    \ifnum\strcmp{#1}{C}=0 C1\else
      \standardcommandquote{#1}\fi\fi}
\makeatletter
\appto\blockextras@serbian{\let\'\serbiancommandquote}
\appto\inlineextras@serbian{\let\'\serbiancommandquote}
\appto\noextras@serbian{\let\'\standardcommandquote}
\makeatother

Note 3 (added Feb. 17): If one has available Unicode input, then also

Đ đ Ž ž Ć ć Č č DŽ Dž dž Š š

are mapped to

Ђ ђ Ж ж Ћ ћ Ч ч Џ џ Ш ш

respectively.