Ligatures are known to the Western European languages, however it isn't that common to spot vertical or complex form of them. As an exercise in typography I tried to get some complex ligatures in Devanāgarī script.
I've downloaded the chandas.ttf file (Southern style; uttara.ttf would serve well as a test case) with complex ligatures in it as well as the Sanskrit 2003 font which doesn't contain complex ligatures to show common form of writing.
I've opened the font in FontForge where we can see those ligatures and their names. Ligatures are properly mapped, so we can use them. I only changed some letters to fit transliteration schemes: Y
as nj
or J
, G
as ng
and I am using Lexilogos and Sanscript to get the portion of words.
Note: In case you are wondering what I am trying to achieve I can say that I try to convert transliterated words sorted in Xindy (Sanskrit, Pāḷi, hopefully even Tamiḻ and Siṇhala later) and I am checking what options I have to display index entries. As mapping is not supported in LuaTeX, I am trying to prepare standalone Lua scripts which will replace Latin letters.
In theory, I have two problems now:
1) How to turn off those complex ligatures locally in document and how to get regular form of writing, if needed? To preview a common form I used different font in the example below. After using otfinfo -f chandas.ttf
we know there are three features, but it is not helping if we turn them off.
2) How to get complex ligatures in LuaLaTeX? As far as I know the support for Indic languages is very limited. Under normal circumstances I am using \char
to get a specific glyph, but those complex ligatures are not mapped as Unicode glyphs in the private use areas (PUA). I would be able to use \XeTeXglyph
and the glyph's slot, but it is not easy as well. FontForge is showing 3417 (0x0D59) for DNjYa
glyph, but the actual position got from XeTeX (\the\XeTeXglyphindex"DNjYa"
) is 2510 (0x09CE). What a day!
A bonus. There is even more fun, if we try to get two complex ligatures next to each other (I cannot say if it is correct linguistically, it is not almost certainly), e.g. DNjJ+DNjJa, the middle letters form ligature earlier while typing it. The solution is to use \mbox{}
, see the last line in the example.
I enclose an example and a preview of my efforts. We can run xelatex
and lualatex
. If you are interested, an encoding table can be obtained from http://www.sanskritweb.net/cakram/chandas-encoding.pdf and a preview of all ligatures from http://www.sanskritweb.net/cakram/saMyoga-pattra.pdf
% run: xelatex or lualatex mal-sanskrit.tex
\documentclass[a4paper]{article}
\pagestyle{empty}
\usepackage{ifxetex}
\usepackage{fontspec}
% Possible addtion for LuaLaTeX (1 line):
%\usepackage{luatextra}
% Possible addition for XeLaTeX (3 lines):
%\usepackage{polyglossia}
%\setmainlanguage{sanskrit}
%\newfontfamily{\devanagarifont}{Sanskrit2003}
\parindent=0pt
\begin{document}
Correct form in Xe\LaTeX, incorrect in Lua\LaTeX\ (2 lines):\par
\setmainfont[Script=Devanagari]{Sanskrit2003.ttf}
ड्ण्ज्ञ (common form; using different font [Sanskrit2003.ttf])\par % ड्ण्ज्ञ
\setmainfont[Script=Devanagari]{chandas.ttf}
ड्ण्ज्ञ (glyph DNjYa; DNjnja [Lexilogos] or DNjJa [Harvard-Kyoto])\par\medskip
% Y as nj or J, G as ng
\setmainfont[Script=Devanagari,RawFeature=-liga;-mkmk;-mark]{chandas.ttf}
ड्ण्ज्ञ (a form of ligature; Devanāgarī script)\par
\setmainfont[Script=Latin]{chandas.ttf}
ड्ण्ज्ञ (incorrect form; Latin script)\par
\setmainfont[Script=Devanagari]{chandas.ttf}
ड्\mbox{}ण्\mbox{}ज्ञ (incorrect form; separated glyphs, \verb.\mbox.)\par\medskip
\ifxetex
3417 as \XeTeXglyph"0D59\ versus % 3417
\the\XeTeXglyphindex"DNjYa"\ as % 2510
\XeTeXglyph2510\ or\ \XeTeXglyph"09CE\ (getting slot number)\par
ड्ण्ज्ञ्ड्ण्ज्ञ / ड्ण्ज्ञ्\mbox{}ड्ण्ज्ञ (DNjJ DNjJa)
\fi
\end{document}
Best Answer
It may be the meaning(s) of the term 'ligature' could be the driver behind the question.
This is a comment with images, so not an answer. The original post (also not an answer, more of an observation) is kept down below, for continuity.
Assuming the question is about indexing transliterated content, then there is a method that involves no decomposition of displayed material.
A string of glyphs is given to a renderer, and the renderer displays it appropriately. The string of glyphs remains available, though, so reverting the display back into its input is not required.
For example, to explore the structure of the orthography, the string संयुक्त व्यंजन, which is what Google returns when "conjunct consonants in Hindi" is entered (and which Google transliterates the pronunciation of as "sanyukt vyanjan"), can be transliterated on a glyph-by-glyph basis as:
saṇya̺uka̺ta va̺yaṇjana
using a mapping methodology whereby the inherent 'a' vowel attached to each consonant is shown, and also shown where it is switched off by the orthography rules:
Here, arbitrarily, the inverted under-bridge combining diacritical mark is being used (via a font mapping file) as a visual representation of the switched-off vowel in the first two cases.
The
\index
command can then take this transliteration string, like any other string, and do its usual work:Code
'.map' file, to compile into a '.tec' file with
teckit_compile.exe
:I would ordinarily expect the reader to want a 'normal' index, as well. Something like:
====
Original post
Looks OK for normal words (in xelatex and in the browser), if I have not misunderstood the question.
Since lualatex does not do conjunct consonants in the first place, there is no need to 'de-ligature' them to create the index entries.
For indexing by (automated) transliteration, again, xelatex is easier, using a font-map (or l3 regex replace).
Test words from https://en.wikibooks.org/wiki/Hindi/Consonant_combinations