Some Unicode characters successfully displayed, but some are not

overleafunicode

I am trying to display some non-latin characters, such as ɩ LATIN SMALL LETTER IOTA (U+0269), ʋ LATIN SMALL LETTER V WITH HOOK (U+028B), and ɔ LATIN SMALL LETTER OPEN O (U+254).

I tried almost every way I could find but none of them work. A subset of potentially crucial things I tried which did not work:

  • all compilers available on Overleaf: pdfLatex, Latex, XeLatex, and LuaLatex

  • \usepackage[utf8]{inputenc} and \usepackage[utf8x]{inputenc} (I know utf8x should be avoided, but with utf8x some characters are able to display.)

  • \usepackage[T1]{fontenc} , \usepackage[T5]{fontenc}, \usepackage[T1,T4]{fontenc}

Minimal Example to demonstrate the issue

\documentclass[11pt]{article}
\usepackage{times}
\usepackage{tipa}
\usepackage{textcomp} 
\usepackage[T1]{fontenc}
\usepackage[utf8x]{inputenc} 

\begin{document}
line1: \unichar{"0111}, \unichar{"00F6} \\
line2: đ, ö \\
line3: \unichar{"0254}, \unichar{"0269}, \unichar{"028B} \\
line4: ɔ, ɩ, ʋ
\end{document}

The output

enter image description here

I have read the Unicode documentations provided by Overleaf (https://www.overleaf.com/learn/latex/Articles/Unicode%2C_UTF-8_and_multilingual_text%3A_An_introduction, https://www.overleaf.com/learn/latex/Multilingual_typesetting_on_Overleaf_using_babel_and_fontspec) and some Unicode documentation pages to obtain some basic knowledge about Unicode.

As I was looking for answers, some solutions have to do with specifying languages for the package babel. However, my paper will include scripts from many languages (can be tens of them) and they are mostly under-represented languages (could be Indigenous and/or endangered) so they are lack of academic linguistic studies, and I do not even know which scripts they are using. I was not even sure if there is anything to do with the script/language.

I went with \unichar{"xxxx} solution because the ^^^^xxxx syntax does not work. With \unichar{"xxxx} syntax, some characters disaply fine, e.g. \unichar{"0111}(đ, latin small letter d with stroke) and \unichar{"00F6}(ö, latin small letter o with diaeresis). However, some are not, e.g. \unichar{"0254} (ɔ, latin small letter open o), \unichar{"0269} (ɩ, latin small letter iota), \unichar{"028B}(ʋ, latin small letter v with hook).

I am wondering why some characters can be displayed and some can not, despite the fact that they are all specified in Unicode in hexadecimal notation in an identical LaTeX environment using the same command (\unichar{}).

Also, is there a way to display those characters which currently cannot be rendered?

Best Answer

Use the encoding utf8 (or whatever) do not mean that any font used will have a glyph for every encoded character. In fact, there are so many, that this almost never happen (maybe except with Unifont, that look horribly pixelated). Some fonts only have the Latin alphabet in capitals or so, while others have thousands of glyphs, but not all the uft8 characters.

So, a first step to deal with bizarre characters could be to search a good font containing most-all "rare" characters that you will need. Using xelatex and lualatex you are not limited to TeX fonts, so there are many alternatives:

mwe

\documentclass{article}
\usepackage{fontspec}
\usepackage{tabto}\NumTabs{5}
\begin{document}
\obeylines
đ ö ɔ ɩ ʋ \tab ← not all printed (default) 
{\setmainfont{GFS Didot} đ ö ɔ ɩ ʋ}     \tab ← Wrong font (Missing  characters) 
{\setmainfont{FontAwesome} đ ö ɔ ɩ ʋ}   \tab ← FonAwesome is not for this ...
{\setmainfont{Unifont} đ ö ɔ ɩ ʋ}       \tab ← Unifont works, but ...
{\setmainfont{FreeSerif} đ ö ɔ ɩ ʋ}     \tab ← FreeSerif works 
{\setmainfont{DejaVu Serif}  đ ö ɔ ɩ ʋ} \tab ← DejaVu Serif works 
{\setmainfont{EB Garamond} đ ö ɔ ɩ ʋ}   \tab ← EB Garamond works 
\end{document}
Related Question