[Tex/LaTex] Special Unicode characters with XeTeX

fontsunicodexetex

Most of the modern fonts out there are encoded in Unicode, but provide only a subset of the whole range of glyphs defined in Unicode. Most lack the Latin Extended Additional range (http://www.unicode.org/charts/PDF/U1E00.pdf). In TeX, characters outside the ASCII range can be constructed with commands like \"a for ä, \=a for ā, \.n for ṅ etc. It seems this mechanism is not supported in XeTeX. You can still use those commands like \d{m} to get a , but if it it's not in the font, it will not be constructed from .m as in TeX, so they are of no help in XeTeX.

Then I was happy to find the newunicodechar package which can redefine characters, like \newunicodechar{ṃ}{\d{m}} and then just type ṃ in the document, but although this works in pdfTeX, is doesn't work in XeTeX since the latter just lacks the "character building function". So XeTeX will again try to find the character in the font, where it is not available. The only way to get the character printed is to substitute the font for those special characters (also by way of \newunicodecharfor example). But this may look rather ugly.

Packages like \usepackage[utf8x]{inputenc} are also not compatible with XeLaTeX, at least not in my installation (MikTeX 2.9).

At least this is my conclusion after some hours of trial and error.

So it seems for my document I either have the choice to go back to (pdf)LaTeX and miss the easy font features of XeLaTeX, or I have to substitute the font with another one for the special characters.

Or is there another way to get around the problem without having to fall back on pdfLaTeX?

Here is the MWE for pdfLaTeX, or XeLaTeX if you remove the % and put them in front of the \usepackage{palatino}.

\documentclass[a4paper,12pt]{article}

% \usepackage{fontspec}  % <----- XeLaTeX
% \defaultfontfeatures{Mapping=tex-text} 
% \setmainfont{Adobe Garamond Pro} % <-- or another font without Latin Ext. Additional

\usepackage{palatino}  % <---- pdfLaTeX

\usepackage{newunicodechar}

% \newfontfamily{\diafont}{Junicode} % <---- redefining the font works in XeLaTeX 
% \newunicodechar{ṃ}{\diafont ṃ}

\newunicodechar{ā}{\={a}} % <-- character substitution works in pdfLaTeX
\newunicodechar{ṃ}{\d{m}} %    but could also be achieved bei inputenc package
\newunicodechar{ṅ}{{\.n}}

\begin{document}
German Umlaute: 
\begin{itemize} 
\item Unicode characters: ä ö ü Ä Ö Ü 
\item by \LaTeX command: \"a \"o \"u \"A \"O \"U

\end{itemize}

a with macron above:
\begin{itemize}
\item by function: ā
\item by \LaTeX: {\=a}
\end{itemize}

m with dot underneath: 
\begin{itemize}
\item by function: ṃ
\item by \LaTeX: \d{m}
\end{itemize}

n with dot above:
\begin{itemize}
\item by function: ṅ
\item by \LaTeX: {\.n}
\end{itemize}

\end{document}

Best Answer

The code in my answer to TeX accents do not seem to work with fontspec and xe/lua/latex gives the idea, but for the dot above some additional code is needed.

\documentclass[a4paper,12pt]{article}

\usepackage{fontspec}
\defaultfontfeatures{Ligatures=TeX} 
\setmainfont{Minion Pro} % a font without Latin Ext. Additional

\usepackage{newunicodechar}

\UndeclareUTFcomposite[\UTFencname]{x0101}{\=}{a}
\UndeclareUTFcomposite[\UTFencname]{x1E43}{\d}{m}
\UndeclareUTFcomposite[\UTFencname]{x1E45}{\.}{n}
\makeatletter
\let\d\relax
\DeclareRobustCommand{\d}[1]
  {\hmode@bgroup
   \o@lign{\relax#1\crcr\hidewidth\ltx@sh@ft{-1ex}.\hidewidth}\egroup
}
\let\.\relax
\DeclareRobustCommand{\.}[1]
  {\hmode@bgroup\vbox{% \o@lign has \vtop
   \lineskiplimit\z@
   \baselineskip\z@skip
   \lineskip.25ex
   \ialign {##\crcr\hidewidth.\hidewidth\crcr#1\crcr}}\egroup
}
\makeatother
\newunicodechar{ā}{\={a}}
\newunicodechar{ṃ}{\d{m}}
\newunicodechar{ṅ}{{\.n}}

\begin{document}
German Umlaute: 
\begin{itemize} 
\item Unicode characters: ä ö ü Ä Ö Ü 
\item by \LaTeX command: \"a \"o \"u \"A \"O \"U
\end{itemize}

a with macron above:
\begin{itemize}
\item by function: ā
\item by \LaTeX: {\=a}
\end{itemize}

m with dot underneath: 
\begin{itemize}
\item by function: ṃ
\item by \LaTeX: \d{m}
\end{itemize}

n with dot above:
\begin{itemize}
\item by function: ṅ
\item by \LaTeX: \.n
\end{itemize}

\end{document}

enter image description here