[Tex/LaTex] How to use unicode symbols in TeX source

unicode

I've tried using \usepackage[utf8]{inputenc}, but that throws an error:

Package inputenc Error: Unicode char µ (U+B5)
(inputenc)                not set up for use with LaTeX.

Whatever I do, unicode character in the source either do not appear, or do not appear and throw an error. I would like to use both Russian and mathematical symbols in my source code.

How can I do this?

Best Answer

Probably this will miss more complex scenarios, but from here:

%%
\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{textcomp}
\DeclareUnicodeCharacter{B5}{\ifmmode\mu\else\textmu\fi}
%
% use http://shapecatcher.com/ to find the char
% or https://w3c.github.io/xml-entities/unicode-names.html

\begin{document}

This is a textmode µ and this a math mode one: $µ_µ$.

\end{document}

If you do not want to define everything, you need to switch to a native unicode TeX engine, like for example xelatex. If you compilte with xelatex the following code:

%%
\documentclass{article}
\usepackage{fontspec}
\usepackage{unicode-math}
%
% use http://shapecatcher.com/ to find the char
% or https://w3c.github.io/xml-entities/unicode-names.html

\begin{document}

Is the character ẁ used in some language?

This is a textmode µ and this a math mode one: $𝜇_𝜇$. 
(But be careful, math mu is a different unicode codepoint,
\texttt{1D707}.)

\end{document}

to obtain:

Where I am using the unicode-math package; I found the mathmode µ codepoint at shapcatcher.

Related Solutions

[Tex/LaTex] Using both unicode and russian in tex source

You need to announce LaTeX the languages you intend to use (T2A is for cyrillic):

\documentclass{article}
\usepackage[T2A,T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[russian,french,english]{babel}

\begin{document}

In Dahl’s dictionary there is a similar sounding word “\foreignlanguage{russian}{дуван}”.

\begin{otherlanguage*}{french}
“C’est auprès de son père, écrivain de la nation pisane à la douane de Bougie, à
la fin du douzième siècle[todo], que le célèbre mathématicien Léonard
Bonacci
\end{otherlanguage*}

\end{document}

enter image description here

The main problem is that fonts have only 256 slots available for glyphs and writing in French and Russian requires more than 256 glyphs. (Maybe this is not strictly true, but even if the number of glyphs were less than 256, a special output encoding for French and Russian would be needed; what about German and Russian, Polish and Russian, or a mixing of three languages?)

You can always define an abbreviation, say \RUS, for typesetting isolated words in Russian

\newcommand{\RUS}[1]{\foreignlanguage{russian}{#1}}

(or, more efficiently, \newcommand{\RUS}{\foreignlanguage{russian}}). You have the benefit that hyphenation will be correct.

A different approach requires using an OpenType font that contains all the needed glyphs, but of course XeLaTeX or LuaLaTeX with fontspec are required.

[Tex/LaTex] Can’t get unicode to work in \title (inputenc: Unicode char not set up for use with LaTeX)

Short answer: You can use \DeclareUnicodeCharacter{23B5}{\textvisiblespace}. Or you can use ␣ (U+2423 OPEN BOX) , which is already defined as \textvisiblespace, instead of using ⎵ (U+23B5 BOTTOM SQUARE BRACKET).

Long answer: When you type ⎵ in your input file and save it with UTF-8 encoding (which is probably the default in your editor), this character U+23B5 BOTTOM SQUARE BRACKET gets stored as the sequence of bytes E2 8E B5. Then when TeX reads the file, because of the \usepackage[utf8]{inputenc} in the preamble, it knows that when it sees those bytes, it should understand it as the character U+23B5. This is great, but quite separately TeX needs to know what to do with that character.

You can in fact give any instruction for any character. In your case it's as simple as simply picking up a symbol and typesetting it. You can look up The Comprehensive LaTeX Symbol List (probably available on your computer with texdoc symbols-a4) and see that the symbol is available (in fact under “Frequently Requested Symbols”) as \textvisiblespace. So you can use that:

\documentclass{article}
\usepackage[utf8]{inputenc}

\usepackage[lf]{venturis}
\usepackage[T1]{fontenc}
\DeclareUnicodeCharacter{23B5}{\textvisiblespace}

\title{Foo⎵Bar}  % this is fine now
\author{Myself}
\date{}

\begin{document}
\maketitle
\end{document}

produces

(Instead of \DeclareUnicodeCharacter{23B5}{\textvisiblespace} you can also use \newunicodechar{⎵}{{\textvisiblespace}} from the newunicodechar package.)

In fact, some common definitions come with a typical TeX distribution: specifically, on my computer /usr/local/texlive/2017/texmf-dist/tex/latex/base/utf8.def contains definitions for ©, ®, º, ˆ, ˜, ZWNJ, …, ™, ␣:

\DeclareUnicodeCharacter{00A9}{\textcopyright}
\DeclareUnicodeCharacter{00AA}{\textordfeminine}
\DeclareUnicodeCharacter{00AE}{\textregistered}
\DeclareUnicodeCharacter{00BA}{\textordmasculine}
\DeclareUnicodeCharacter{02C6}{\textasciicircum}
\DeclareUnicodeCharacter{02DC}{\textasciitilde}
\DeclareUnicodeCharacter{200C}{\textcompwordmark}
\DeclareUnicodeCharacter{2026}{\textellipsis}
\DeclareUnicodeCharacter{2122}{\texttrademark}
\DeclareUnicodeCharacter{2423}{\textvisiblespace}

but defining ⎵ as \textvisiblespace is not one of them.

Alternative: If you use a Unicode-aware TeX engine like XeTeX or LuaTeX (compile your file with xelatex or lualatex rather than pdflatex), then you can simply use the ⎵ character from a system font that contains it. For example:

\documentclass{article}
\usepackage{fontspec}
\setmainfont{FreeSerif} % Sets this font for the entire document
\begin{document}
Foo⎵Bar
\end{document}

\documentclass{article}
\usepackage{fontspec}
\newfontfamily{\symbolfont}{Noto Sans Symbols}
\usepackage{newunicodechar}
\newunicodechar{⎵}{{\symbolfont⎵}} % Use this font for this specific character
\begin{document}
Foo⎵Bar
\end{document}

Watch out that TeX has a rather serious (IMO) usability/design flaw here: if the character is missing in the font, then it doesn't cause an error but simply shows a missing character warning in the .log file. You can use \tracinglostchars=2 to have the message shown in the terminal, but you still need to watch out for it.

Best Answer

Related Solutions

[Tex/LaTex] Using both unicode and russian in tex source

[Tex/LaTex] Can’t get unicode to work in \title (inputenc: Unicode char not set up for use with LaTeX)

Related Question