You need to announce LaTeX the languages you intend to use (T2A is for cyrillic):
\documentclass{article}
\usepackage[T2A,T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[russian,french,english]{babel}
\begin{document}
In Dahl’s dictionary there is a similar sounding word “\foreignlanguage{russian}{дуван}”.
\begin{otherlanguage*}{french}
“C’est auprès de son père, écrivain de la nation pisane à la douane de Bougie, à
la fin du douzième siècle[todo], que le célèbre mathématicien Léonard
Bonacci
\end{otherlanguage*}
\end{document}
The main problem is that fonts have only 256 slots available for glyphs and writing in French and Russian requires more than 256 glyphs. (Maybe this is not strictly true, but even if the number of glyphs were less than 256, a special output encoding for French and Russian would be needed; what about German and Russian, Polish and Russian, or a mixing of three languages?)
You can always define an abbreviation, say \RUS
, for typesetting isolated words in Russian
\newcommand{\RUS}[1]{\foreignlanguage{russian}{#1}}
(or, more efficiently, \newcommand{\RUS}{\foreignlanguage{russian}}
). You have the benefit that hyphenation will be correct.
A different approach requires using an OpenType font that contains all the needed glyphs, but of course XeLaTeX or LuaLaTeX with fontspec
are required.
Short answer: You can use \DeclareUnicodeCharacter{23B5}{\textvisiblespace}
. Or you can use ␣
(U+2423 OPEN BOX) , which is already defined as \textvisiblespace
, instead of using ⎵
(U+23B5 BOTTOM SQUARE BRACKET).
Long answer: When you type ⎵
in your input file and save it with UTF-8 encoding (which is probably the default in your editor), this character U+23B5 BOTTOM SQUARE BRACKET gets stored as the sequence of bytes E2 8E B5
. Then when TeX reads the file, because of the \usepackage[utf8]{inputenc}
in the preamble, it knows that when it sees those bytes, it should understand it as the character U+23B5
. This is great, but quite separately TeX needs to know what to do with that character.
You can in fact give any instruction for any character. In your case it's as simple as simply picking up a symbol and typesetting it. You can look up The Comprehensive LaTeX Symbol List (probably available on your computer with texdoc symbols-a4
) and see that the symbol is available (in fact under “Frequently Requested Symbols”) as \textvisiblespace
. So you can use that:
\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage[lf]{venturis}
\usepackage[T1]{fontenc}
\DeclareUnicodeCharacter{23B5}{\textvisiblespace}
\title{Foo⎵Bar} % this is fine now
\author{Myself}
\date{}
\begin{document}
\maketitle
\end{document}
produces
(Instead of \DeclareUnicodeCharacter{23B5}{\textvisiblespace}
you can also use \newunicodechar{⎵}{{\textvisiblespace}}
from the newunicodechar package.)
In fact, some common definitions come with a typical TeX distribution: specifically, on my computer /usr/local/texlive/2017/texmf-dist/tex/latex/base/utf8.def
contains definitions for ©, ®, º, ˆ, ˜, ZWNJ, …, ™, ␣:
\DeclareUnicodeCharacter{00A9}{\textcopyright}
\DeclareUnicodeCharacter{00AA}{\textordfeminine}
\DeclareUnicodeCharacter{00AE}{\textregistered}
\DeclareUnicodeCharacter{00BA}{\textordmasculine}
\DeclareUnicodeCharacter{02C6}{\textasciicircum}
\DeclareUnicodeCharacter{02DC}{\textasciitilde}
\DeclareUnicodeCharacter{200C}{\textcompwordmark}
\DeclareUnicodeCharacter{2026}{\textellipsis}
\DeclareUnicodeCharacter{2122}{\texttrademark}
\DeclareUnicodeCharacter{2423}{\textvisiblespace}
but defining ⎵
as \textvisiblespace
is not one of them.
Alternative: If you use a Unicode-aware TeX engine like XeTeX or LuaTeX (compile your file with xelatex
or lualatex
rather than pdflatex
), then you can simply use the ⎵ character from a system font that contains it. For example:
\documentclass{article}
\usepackage{fontspec}
\setmainfont{FreeSerif} % Sets this font for the entire document
\begin{document}
Foo⎵Bar
\end{document}
or
\documentclass{article}
\usepackage{fontspec}
\newfontfamily{\symbolfont}{Noto Sans Symbols}
\usepackage{newunicodechar}
\newunicodechar{⎵}{{\symbolfont⎵}} % Use this font for this specific character
\begin{document}
Foo⎵Bar
\end{document}
Watch out that TeX has a rather serious (IMO) usability/design flaw here: if the character is missing in the font, then it doesn't cause an error but simply shows a missing character warning in the .log
file. You can use \tracinglostchars=2
to have the message shown in the terminal, but you still need to watch out for it.
Best Answer
Probably this will miss more complex scenarios, but from here:
If you do not want to define everything, you need to switch to a native unicode TeX engine, like for example
xelatex
. If you compilte withxelatex
the following code:to obtain:
Where I am using the
unicode-math
package; I found the mathmode µ codepoint at shapcatcher.