Here is a method for XeLaTeX.
Prepare a file ascii-to-serbian.map
with the following content:
; TECkit mapping for TeX input conventions <-> Unicode characters
LHSName "ASCII-to-Serbian"
RHSName "UNICODE"
pass(Unicode)
; ligatures from Knuth's original CMR fonts
U+002D U+002D <> U+2013 ; -- -> en dash
U+002D U+002D U+002D <> U+2014 ; --- -> em dash
U+0027 <> U+2019 ; ' -> right single quote
U+0027 U+0027 <> U+201D ; '' -> right double quote
U+0022 > U+201D ; " -> right double quote
U+0060 <> U+2018 ; ` -> left single quote
U+0060 U+0060 <> U+201C ; `` -> left double quote
U+0021 U+0060 <> U+00A1 ; !` -> inverted exclam
U+003F U+0060 <> U+00BF ; ?` -> inverted question
; additions supported in T1 encoding
U+002C U+002C <> U+201E ; ,, -> DOUBLE LOW-9 QUOTATION MARK
U+003C U+003C <> U+00AB ; << -> LEFT POINTING GUILLEMET
U+003E U+003E <> U+00BB ; >> -> RIGHT POINTING GUILLEMET
U+0041 <> U+0410 ; A
U+0042 <> U+0411 ; B
U+0043 <> U+0426 ; C
U+0043 U+0048 <> U+0427 ; CH
U+0043 U+0068 <> U+0427 ; Ch
U+0043 U+0031 <> U+040B ; C1
U+0027 U+0043 <> U+040B ; 'C
U+0044 <> U+0414 ; D
U+0044 U+004A <> U+0402 ; DJ
U+0044 U+006A <> U+0402 ; Dj
U+0044 U+005A U+0048 <> U+040F ; DZH
U+0044 U+007A U+0068 <> U+040F ; Dzh
U+0044 U+0031 <> U+040F ; D1
U+0045 <> U+0415 ; E
U+0046 <> U+0424 ; F
U+0047 <> U+0413 ; G
U+0048 <> U+0425 ; H
U+0049 <> U+0418 ; I
U+004A <> U+0408 ; J
U+004B <> U+041A ; K
U+004B U+0048 <> U+0425 ; KH
U+004B U+0068 <> U+0425 ; Kh
U+004C <> U+041B ; L
U+004C U+004A <> U+0409 ; LJ
U+004C U+006A <> U+0409 ; Lj
U+004D <> U+041C ; M
U+004E <> U+041D ; N
U+004E U+004A <> U+040A ; NJ
U+004E U+006A <> U+040A ; Nj
U+004F <> U+041E ; O
U+0050 <> U+041F ; P
;U+0051 <> ; Q
U+0052 <> U+0420 ; R
U+0053 <> U+0421 ; S
U+0053 U+0048 <> U+0428 ; SH
U+0053 U+0068 <> U+0428 ; Sh
U+0054 <> U+0422 ; T
U+0055 <> U+0423 ; U
U+0056 <> U+0412 ; V
;U+0057 <> ; W
U+0058 <> U+0425 ; X
;U+0059 ; Y
U+005A <> U+0417 ; Z
U+005A U+0048 <> U+0416 ; ZH
U+005A U+0068 <> U+0416 ; Zh
U+0061 <> U+0430 ; a
U+0062 <> U+0431 ; b
U+0063 <> U+0446 ; c
U+0063 U+0068 <> U+0447 ; ch
U+0063 U+0031 <> U+045B ; c1
U+0027 U+0063 <> U+045B ; 'c
U+0064 <> U+0434 ; d
U+0064 U+006A <> U+0452 ; dj
U+0064 U+007A U+0068 <> U+045F ; dzh
U+0064 U+0031 <> U+045F ; d1
U+0065 <> U+0435 ; e
U+0066 <> U+0444 ; f
U+0067 <> U+0433 ; g
U+0068 <> U+0445 ; h
U+0069 <> U+0438 ; i
U+006A <> U+0458 ; j
U+006B <> U+043A ; k
U+006B U+0068 <> U+0445 ; kh
U+006C <> U+043B ; l
U+006C U+006A <> U+0459 ; lj
U+006D <> U+043C ; m
U+006E <> U+043D ; n
U+006E U+006A <> U+045A ; nj
U+006F <> U+043E ; o
U+0070 <> U+043F ; p
;U+0071 <> ; q
U+0072 <> U+0440 ; r
U+0073 <> U+0441 ; s
U+0073 U+0068 <> U+0448 ; sh
U+0074 <> U+0442 ; t
U+0075 <> U+0443 ; u
U+0076 <> U+0432 ; v
;U+0077 <> ; w
U+0078 <> U+0445 ; x
;U+0079 ; y
U+007A <> U+0437 ; z
U+007A U+0068 <> U+0436 ; zh
; Additional (for official translitteration)
U+0110 <> U+0402 ; Đ
U+0111 <> U+0452 ; đ
U+017D <> U+0416 ; Ž
U+017E <> U+0436 ; ž
U+0106 <> U+040B ; Ć
U+0107 <> U+045B ; ć
U+010C <> U+0427 ; Č
U+010D <> U+0447 ; č
U+0044 U+017D <> U+040F ; DŽ
U+0044 U+017E <> U+040F ; Dž
U+0064 U+017E <> U+045F ; dž
U+0160 <> U+0428 ; Š
U+0161 <> U+0448 ; š
Then process it with
teckit_compile ascii-to-serbian.map
This will produce a file ascii-to-serbian.tec
that you can put anywhere XeTeX will find it (in the working directory, for instance). Then make the following test file:
\documentclass{article}
\usepackage{fontspec}
\setmainfont[Ligatures=TeX]{Linux Libertine O}
\newfontfamily{\serbianfont}[Mapping=ascii-to-serbian]{Linux Libertine O}
\usepackage{polyglossia}
\setmainlanguage{english}
\setotherlanguage[Script=Cyrillic]{serbian}
\begin{document}
Serbian alphabet again
\begin{serbian}
A B V G D DJ E Zh Z I J K L LJ M N NJ O P R S T C1 U F Kh C Ch D1 Sh
a b v g d dj e zh z i j k l m n nj o p r s t c1 u f kh c ch d1 sh
\end{serbian}
\end{document}
Sample output after xelatex test.tex
Note 1: the characters Џ
and џ
can be input also as DZH
(or Dzh
) and dzh
. If this is incorrect (it might bring to incorrect ligatures) then remove the corresponding lines from ascii-to-serbian.map
.
Note 2: if you find it inconvenient to type C1
and c1
to get Ћ and ћ, you can add the lines
U+0027 U+0043 <> U+040B ; 'C
and
U+0027 U+0063 <> U+040B ; 'c
after the C1
and c1
entries. This will allow you to input the characters as 'C
and 'c
.
If you want to input them as \'C
and \'c
, then insert this code after having loaded the Serbian language with Polyglossia
\let\standardcommandquote\'
\DeclareRobustCommand{\serbiancommandquote}[1]{%
\ifnum\strcmp{#1}{c}=0 c1\else
\ifnum\strcmp{#1}{C}=0 C1\else
\standardcommandquote{#1}\fi\fi}
\makeatletter
\appto\blockextras@serbian{\let\'\serbiancommandquote}
\appto\inlineextras@serbian{\let\'\serbiancommandquote}
\appto\noextras@serbian{\let\'\standardcommandquote}
\makeatother
Note 3 (added Feb. 17): If one has available Unicode input, then also
Đ đ Ž ž Ć ć Č č DŽ Dž dž Š š
are mapped to
Ђ ђ Ж ж Ћ ћ Ч ч Џ џ Ш ш
respectively.
The method is similar to that one used for Serbian. Prepare the following cyrillic-to-latin.map
file:
; TECkit mapping for TeX input conventions <-> Unicode characters
LHSName "Cyrillic-to-Latin"
RHSName "UNICODE"
pass(Unicode)
; ligatures from Knuth's original CMR fonts
U+002D U+002D <> U+2013 ; -- -> en dash
U+002D U+002D U+002D <> U+2014 ; --- -> em dash
U+0027 <> U+2019 ; ' -> right single quote
U+0027 U+0027 <> U+201D ; '' -> right double quote
U+0022 > U+201D ; " -> right double quote
U+0060 <> U+2018 ; ` -> left single quote
U+0060 U+0060 <> U+201C ; `` -> left double quote
U+0021 U+0060 <> U+00A1 ; !` -> inverted exclam
U+003F U+0060 <> U+00BF ; ?` -> inverted question
; additions supported in T1 encoding
U+002C U+002C <> U+201E ; ,, -> DOUBLE LOW-9 QUOTATION MARK
U+003C U+003C <> U+00AB ; << -> LEFT POINTING GUILLEMET
U+003E U+003E <> U+00BB ; >> -> RIGHT POINTING GUILLEMET
U+0410 <> U+0041 ; A
U+0411 <> U+0042 ; B
U+0412 <> U+0056 ; V
U+0413 <> U+0047 ; G
U+0414 <> U+0044 ; D
U+0415 <> U+0045 ; E
U+0416 <> U+017D ; Ž
U+0417 <> U+005A ; Z
U+0418 <> U+004A ; J
U+041A <> U+004B ; K
U+041B <> U+004C ; L
U+041C <> U+004D ; M
U+041D <> U+004E ; N
U+041E <> U+004F ; O
U+041F <> U+0050 ; P
U+0420 <> U+0052 ; R
U+0421 <> U+0053 ; S
U+0422 <> U+0054 ; T
U+0423 <> U+0055 ; U
U+0424 <> U+0046 ; F
U+0426 <> U+0043 ; C
U+0427 <> U+010C ; Č
U+0428 <> U+0160 ; Š
U+042D <> U+0116 ; Ė
U+042E <> U+004A U+0075 ; Ju
U+042F <> U+004A U+0061 ; Ja
U+0401 <> U+00CB ; Ë
U+0430 <> U+0061 ; a
U+0431 <> U+0062 ; b
U+0432 <> U+0076 ; v
U+0433 <> U+0067 ; g
U+0434 <> U+0064 ; d
U+0435 <> U+0065 ; e
U+0436 <> U+017E ; ž
U+0437 <> U+007A ; z
U+0438 <> U+0069 ; i
U+0439 <> U+006A ; j
U+043A <> U+006B ; k
U+043B <> U+006C ; l
U+043C <> U+006D ; m
U+043D <> U+006E ; n
U+043E <> U+006F ; o
U+043F <> U+0070 ; p
U+0440 <> U+0072 ; r
U+0441 <> U+0073 ; s
U+0442 <> U+0074 ; t
U+0443 <> U+0075 ; u
U+0444 <> U+0066 ; f
U+0446 <> U+0063 ; c
U+0447 <> U+010D ; č
U+0448 <> U+0161 ; š
U+044D <> U+0117 ; ė
U+044E <> U+006A U+0075 ; ju
U+044F <> U+006A U+0061 ; ja
U+0451 <> U+00EB ; ë
U+0456 <> U+0069 ; i
U+0406 <> U+0049 ; I
U+0454 <> U+006A U+0065 ; je
U+0468 <> U+004A U+0065 ; Je
U+0425 <> U+0058 ; X
U+0445 <> U+0078 ; x
U+0418 <> U+0049 ; I
U+0429 <> U+0160 U+010C ; ŠČ
U+042A <> U+0027 ; '
U+042B <> U+0059 ; Y
U+042C <> U+2019 ; '
U+0449 <> U+0161 U+010D ; šč
U+044A <> U+2019 ; '
U+044B <> U+0079 ; y
U+044C <> U+2019 ; '
and run it through teckit_compile
to produce the file cyrillic-to-latin.tec
file that should be put in a place where XeTeX can find it. Then a document such as the following
\documentclass{article}
\usepackage{fontspec}
\setmainfont[Ligatures=TeX]{Linux Libertine O}
\usepackage{polyglossia}
\setmainlanguage{english}
\setotherlanguage{russian}
\newfontfamily{\transrussian}[Mapping=cyrillic-to-latin]{Linux Libertine O}
\newenvironment{translitterated}
{\transrussian\hyphenrules{nohyphenation}\ignorespaces}
{\ignorespacesafterend}
\begin{document}
\begin{russian}
Москва — столица Российской Федерации, город федерального значения,
административный центр Центрального федерального округа и центр
Московской области, в состав которой не входит. Крупнейший по
численности населения город России и Европы (население на 1 января
2012 года — 11 629 116 человек), по этому показателю входит в
десятку крупнейших городов мира. Центр Московской городской
агломерации.
\end{russian}
\begin{translitterated}
Москва — столица Российской Федерации, город федерального значения,
административный центр Центрального федерального округа и центр
Московской области, в состав которой не входит. Крупнейший по
численности населения город России и Европы (население на 1 января
2012 года — 11 629 116 человек), по этому показателю входит в
десятку крупнейших городов мира. Центр Московской городской
агломерации.
\end{translitterated}
\end{document}
will give a result similar to the following
The nohyphenation
in the translitterated
environment definition is necessary as XeTeX doesn't know how to hyphenate translitterated Russian.
Best Answer
You should use
because that's text.
You can consider using
unicode-math
instead ofmathtext
but notice that
amsmath
must be loaded before it. This wouldn't change the way you input that subscript.