First, a font does not necessarily support all types of ligatures. Linux Libertine supports only (checked here) Ligatures={Common,Rare,Discretionary}
.
The OpenType variant of Linux Libertine shipped with media-fonts/libertine-ttf
works on an up-to-date Gentoo with TeX Live 2011. Another option is to install the dev-texlive/texlive-fontsextra
package, which also contains the font. To use it and to be able to select it by name, run this command after installation:
eselect fontconfig enable 09-texlive.conf
This will allow all programs to access fonts installed in the TeX Live texmf
tree.
Then this code will work as expected:
\documentclass{article}
\usepackage{fontspec}
\setmainfont[Mapping=tex-text,Ligatures={Common,Rare,Discretionary}]{Linux Libertine O}
\begin{document}
Hello World
\end{document}
Note: This was tested on TeX Live 2011. Gentoo has TeX Live 2010 as the stable version. If you keep experiencing problems, try upgrading to the newer version of TeX Live.
It may be the meaning(s) of the term 'ligature' could be the driver behind the question.
This is a comment with images, so not an answer. The original post (also not an answer, more of an observation) is kept down below, for continuity.
Assuming the question is about indexing transliterated content, then there is a method that involves no decomposition of displayed material.
A string of glyphs is given to a renderer, and the renderer displays it appropriately. The string of glyphs remains available, though, so reverting the display back into its input is not required.
For example, to explore the structure of the orthography, the string संयुक्त व्यंजन, which is what Google returns when "conjunct consonants in Hindi" is entered (and which Google transliterates the pronunciation of as "sanyukt vyanjan"), can be transliterated on a glyph-by-glyph basis as:
saṇya̺uka̺ta va̺yaṇjana
using a mapping methodology whereby the inherent 'a' vowel attached to each consonant is shown, and also shown where it is switched off by the orthography rules:
- before another vowel
- in between two consonants where it is not needed
- at the end of a word
Here, arbitrarily, the inverted under-bridge combining diacritical mark is being used (via a font mapping file) as a visual representation of the switched-off vowel in the first two cases.
The \index
command can then take this transliteration string, like any other string, and do its usual work:
Code
\documentclass[12pt]{article}
\usepackage{xcolor}
\usepackage{fontspec}
\setmainfont[Script=Devanagari]{Noto Serif Devanagari}
\newfontface\translitd[Mapping=devanagari-to-latin,Scale=1.1,Colour=red]{Noto Sans}
\newfontfamily\englishfont{Noto Serif}
\usepackage{polyglossia}
\setdefaultlanguage{hindi}
\setotherlanguages{english}
\usepackage{imakeidx}
\makeindex
\begin{document}
\Large
संयुक्त व्यंजन
{\normalsize\textenglish{sanyukt vyanjan}}
{\translitd संयुक्त}\index{{\translitd संयुक्त}}
{\translitd व्यंजन}\index{{\translitd व्यंजन}}
\printindex
\end{document}
'.map' file, to compile into a '.tec' file with teckit_compile.exe
:
; TECkit mapping for TeX input conventions <-> Unicode characters
LHSName "devanagari-to-latin"
RHSName "UNICODE"
pass(Unicode)
; ligatures from Knuth's original CMR fonts
U+002D U+002D <> U+2013 ; -- -> en dash
U+002D U+002D U+002D <> U+2014 ; --- -> em dash
U+0027 <> U+2019 ; ' -> right single quote
U+0027 U+0027 <> U+201D ; '' -> right double quote
U+0022 > U+201D ; " -> right double quote
U+0060 <> U+2018 ; ` -> left single quote
U+0060 U+0060 <> U+201C ; `` -> left double quote
U+0021 U+0060 <> U+00A1 ; !` -> inverted exclam
U+003F U+0060 <> U+00BF ; ?` -> inverted question
; additions supported in T1 encoding
U+002C U+002C <> U+201E ; ,, -> DOUBLE LOW-9 QUOTATION MARK
U+003C U+003C <> U+00AB ; << -> LEFT POINTING GUILLEMET
U+003E U+003E <> U+00BB ; >> -> RIGHT POINTING GUILLEMET
U+0924 <> U+0074 U+0061 ; ta
U+094D <> U+033A ; strikeout previous
U+0915 <> U+006B U+0061 ; ka
U+0941 <> U+033A U+0075 ; -u
U+092F <> U+0079 U+0061 ; ya
U+0902 <> U+006E U+0323 ; n.
U+0938 <> U+0073 U+0061 ; sa
U+0928 <> U+006E U+0061 ; na
U+091C <> U+006A U+0061 ; ja
U+0935 <> U+0076 U+0061 ; va
I would ordinarily expect the reader to want a 'normal' index, as well. Something like:
====
Original post
Looks OK for normal words (in xelatex and in the browser), if I have not misunderstood the question.
Since lualatex does not do conjunct consonants in the first place, there is no need to 'de-ligature' them to create the index entries.
For indexing by (automated) transliteration, again, xelatex is easier, using a font-map (or l3 regex replace).
\documentclass[12pt]{article}
\usepackage{fontspec}
\setmainfont[Script=Devanagari]{Noto Serif Devanagari}
\newfontfamily\englishfont{Noto Serif}
\usepackage{polyglossia}
\setdefaultlanguage{hindi}
\setotherlanguages{english}
\begin{document}
\Large
संयुक्त व्यंजन
{\normalsize\textenglish{sanyukt vyanjan}}
\noindent शुक्ल ख्मेर मुख्य अंग्रेज़ी \\
अच्छा छुट्टी ठ्रेइन बुद्ध विद्यार्थी
\noindent {\normalsize\textenglish{shukla khmer mukhya angrezî \\ achchhâ chhuTTî trein buddha vidyârthî}
ल् + म = ल्म फ़िल्म \textenglish{film}
\end{document}
Test words from https://en.wikibooks.org/wiki/Hindi/Consonant_combinations
Best Answer
You can either insert the character itself into the source, since XeTeX expects unicode-encoded source (provided your editor is compliant), or you can use \char"#### where #### is the unicode hex number.
produces:
There are probably other ways too, but these work for me.