[Tex/LaTex] Accessing Private Use Area Character by fontspec in XeTeX

fontspecopentypexetex

I am using XeTeX to encode Opentype fonts. One of the feature of opentype font is availability of Private Use Area.

Generally, Private Use Area in most cases contains different variants of same characters and different ligatures. It also contains characters that has not been defined by Unicode.
This Private Use Area comes in great application when encoding many Indic Texts(Sanskrit and others), which have many variants of same text and many more text left to be encoded by Unicode. So, font provider supply those characters in Private Use Area.

Question: How do I access Private Use Area by using fontspec in XeTeX ?
Yes! I check the manual of fontspec but didn't seem to find this ?

Any help would be appreciated.

Best Answer

You can either insert the character itself into the source, since XeTeX expects unicode-encoded source (provided your editor is compliant), or you can use \char"#### where #### is the unicode hex number.

\documentclass{article} 
\usepackage{fontspec}
\setmainfont{Linux Libertine O}

\begin{document}
 %Here I have the character itself, which may or may not show up on your end
\char"E000 % here is the unicode reference number
\end{document}

produces:

alt text

There are probably other ways too, but these work for me.

Related Solutions

[Tex/LaTex] Problems with opentype options under XeTeX

First, a font does not necessarily support all types of ligatures. Linux Libertine supports only (checked here) Ligatures={Common,Rare,Discretionary}.

The OpenType variant of Linux Libertine shipped with media-fonts/libertine-ttf works on an up-to-date Gentoo with TeX Live 2011. Another option is to install the dev-texlive/texlive-fontsextra package, which also contains the font. To use it and to be able to select it by name, run this command after installation:

eselect fontconfig enable 09-texlive.conf

This will allow all programs to access fonts installed in the TeX Live texmf tree.

Then this code will work as expected:

\documentclass{article}

\usepackage{fontspec}
\setmainfont[Mapping=tex-text,Ligatures={Common,Rare,Discretionary}]{Linux Libertine O}

\begin{document}

Hello World

\end{document}

Note: This was tested on TeX Live 2011. Gentoo has TeX Live 2010 as the stable version. If you keep experiencing problems, try upgrading to the newer version of TeX Live.

[Tex/LaTex] Complex ligatures in Devanāgarī

It may be the meaning(s) of the term 'ligature' could be the driver behind the question.

This is a comment with images, so not an answer. The original post (also not an answer, more of an observation) is kept down below, for continuity.

Assuming the question is about indexing transliterated content, then there is a method that involves no decomposition of displayed material.

A string of glyphs is given to a renderer, and the renderer displays it appropriately. The string of glyphs remains available, though, so reverting the display back into its input is not required.

For example, to explore the structure of the orthography, the string संयुक्त व्यंजन, which is what Google returns when "conjunct consonants in Hindi" is entered (and which Google transliterates the pronunciation of as "sanyukt vyanjan"), can be transliterated on a glyph-by-glyph basis as:

saṇya̺uka̺ta va̺yaṇjana

using a mapping methodology whereby the inherent 'a' vowel attached to each consonant is shown, and also shown where it is switched off by the orthography rules:

before another vowel
in between two consonants where it is not needed
at the end of a word

Here, arbitrarily, the inverted under-bridge combining diacritical mark is being used (via a font mapping file) as a visual representation of the switched-off vowel in the first two cases.

The \index command can then take this transliteration string, like any other string, and do its usual work:

Code

\documentclass[12pt]{article}
\usepackage{xcolor}
\usepackage{fontspec}
\setmainfont[Script=Devanagari]{Noto Serif Devanagari}
\newfontface\translitd[Mapping=devanagari-to-latin,Scale=1.1,Colour=red]{Noto Sans}
\newfontfamily\englishfont{Noto Serif}
\usepackage{polyglossia}
\setdefaultlanguage{hindi}
\setotherlanguages{english}
\usepackage{imakeidx}
\makeindex

\begin{document}
\Large
संयुक्त व्यंजन
{\normalsize\textenglish{sanyukt vyanjan}}

{\translitd संयुक्त}\index{{\translitd संयुक्त}}
{\translitd व्यंजन}\index{{\translitd व्यंजन}}


\printindex
\end{document}

'.map' file, to compile into a '.tec' file with teckit_compile.exe:

; TECkit mapping for TeX input conventions <-> Unicode characters

LHSName "devanagari-to-latin"
RHSName "UNICODE"

pass(Unicode)

; ligatures from Knuth's original CMR fonts
U+002D U+002D           <>  U+2013  ; -- -> en dash
U+002D U+002D U+002D    <>  U+2014  ; --- -> em dash

U+0027          <>  U+2019  ; ' -> right single quote
U+0027 U+0027   <>  U+201D  ; '' -> right double quote
U+0022           >  U+201D  ; " -> right double quote

U+0060          <>  U+2018  ; ` -> left single quote
U+0060 U+0060   <>  U+201C  ; `` -> left double quote

U+0021 U+0060   <>  U+00A1  ; !` -> inverted exclam
U+003F U+0060   <>  U+00BF  ; ?` -> inverted question

; additions supported in T1 encoding
U+002C U+002C   <>  U+201E  ; ,, -> DOUBLE LOW-9 QUOTATION MARK
U+003C U+003C   <>  U+00AB  ; << -> LEFT POINTING GUILLEMET
U+003E U+003E   <>  U+00BB  ; >> -> RIGHT POINTING GUILLEMET



U+0924 <> U+0074 U+0061 ;  ta 
U+094D <> U+033A ; strikeout previous
U+0915 <> U+006B U+0061 ; ka
U+0941 <> U+033A U+0075 ; -u
U+092F <> U+0079 U+0061 ; ya
U+0902 <> U+006E U+0323 ; n.
U+0938 <> U+0073 U+0061 ; sa
U+0928 <> U+006E U+0061 ; na
U+091C <> U+006A U+0061 ; ja
U+0935 <> U+0076 U+0061 ; va

I would ordinarily expect the reader to want a 'normal' index, as well. Something like:

====

Original post

Looks OK for normal words (in xelatex and in the browser), if I have not misunderstood the question.

Since lualatex does not do conjunct consonants in the first place, there is no need to 'de-ligature' them to create the index entries.

For indexing by (automated) transliteration, again, xelatex is easier, using a font-map (or l3 regex replace).


\documentclass[12pt]{article}
\usepackage{fontspec}
\setmainfont[Script=Devanagari]{Noto Serif Devanagari}
\newfontfamily\englishfont{Noto Serif}
\usepackage{polyglossia}
\setdefaultlanguage{hindi}
\setotherlanguages{english}


\begin{document}
\Large
संयुक्त व्यंजन

{\normalsize\textenglish{sanyukt vyanjan}}

\noindent   शुक्ल ख्मेर मुख्य अंग्रेज़ी \\ 
    अच्छा छुट्टी ठ्रेइन बुद्ध विद्यार्थी


\noindent {\normalsize\textenglish{shukla khmer mukhya angrezî \\ achchhâ chhuTTî trein buddha vidyârthî}


ल् +    म = ल्म     फ़िल्म  \textenglish{film}
\end{document}

Test words from https://en.wikibooks.org/wiki/Hindi/Consonant_combinations

Best Answer

Related Solutions

[Tex/LaTex] Problems with opentype options under XeTeX

[Tex/LaTex] Complex ligatures in Devanāgarī

Related Question