[Tex/LaTex] Using both unicode and russian in tex source

cyrillicunicode

EDIT: In the end I ended up using XeTeX (with auto-refreshing Evince viewer) – as suggested by @Andrey Vihrov. I am, however, accepting the most upvoted answer.

I am lost. Looked all over tex.stackexchange and can't find a good solution. Only suggestions to use babel or xetex…

I want to be able to use BOTH russian (cyrillic) and unicode characters in my latex source files. For example, this does not compile:

In Dahl’s dictionary there is a similar sounding word “дуван”...
“C’est auprès de son père, écrivain de la nation pisane à la douane de Bougie, à
la fin du douzième siècle[todo], que le célèbre mathématicien Léonard Bonacci...

If I use babel and set it to russian for above, the compiler pukes on the other non-russian unicode chars. If I set babel to english – then the russian does not work:

Package inputenc Error: Unicode char \u8:д not set up for use with LaTeX.

Please note, I don't really care for "hyphenation" and such – I can do that myself manually if need be. I just want my source documents to compile into lagex.

The problem is my main document is typeset in english, with a lot of different quotes that have languages ranging all across Europe.

Is this possible with only LaTeX (dvi)? Or must I resort to something else? I would very much prefer to stay in LaTeX – as all my compile tools are setup for it.

Either way, I would appreciate some advice.

Best Answer

You need to announce LaTeX the languages you intend to use (T2A is for cyrillic):

\documentclass{article}
\usepackage[T2A,T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[russian,french,english]{babel}

\begin{document}

In Dahl’s dictionary there is a similar sounding word “\foreignlanguage{russian}{дуван}”.

\begin{otherlanguage*}{french}
“C’est auprès de son père, écrivain de la nation pisane à la douane de Bougie, à
la fin du douzième siècle[todo], que le célèbre mathématicien Léonard
Bonacci
\end{otherlanguage*}

\end{document}

enter image description here

The main problem is that fonts have only 256 slots available for glyphs and writing in French and Russian requires more than 256 glyphs. (Maybe this is not strictly true, but even if the number of glyphs were less than 256, a special output encoding for French and Russian would be needed; what about German and Russian, Polish and Russian, or a mixing of three languages?)

You can always define an abbreviation, say \RUS, for typesetting isolated words in Russian

\newcommand{\RUS}[1]{\foreignlanguage{russian}{#1}}

(or, more efficiently, \newcommand{\RUS}{\foreignlanguage{russian}}). You have the benefit that hyphenation will be correct.

A different approach requires using an OpenType font that contains all the needed glyphs, but of course XeLaTeX or LuaLaTeX with fontspec are required.

Related Solutions

[Tex/LaTex] Problems with Cyrillic fonts in XeTeX

This is not an answer, but a comment can't contain an MWE and pictures

It would be helpful if you did some more work in diagnosing where the problem occurs. Simplifying your example and compiling the code with XeLaTeX gives no issues for me.

\documentclass{article}
\usepackage{fontspec}
\setmainfont{CMU Serif}  
\begin{document}
\section{Свои проекты и вклады}
\emph{Докладчик}Доклад <<С++ без new и delete>>.
\end{document}

enter image description here

[Tex/LaTex] How to use unicode symbols in TeX source

Probably this will miss more complex scenarios, but from here:

%%
\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{textcomp}
\DeclareUnicodeCharacter{B5}{\ifmmode\mu\else\textmu\fi}
%
% use http://shapecatcher.com/ to find the char
% or https://w3c.github.io/xml-entities/unicode-names.html

\begin{document}

This is a textmode µ and this a math mode one: $µ_µ$.

\end{document}

If you do not want to define everything, you need to switch to a native unicode TeX engine, like for example xelatex. If you compilte with xelatex the following code:

%%
\documentclass{article}
\usepackage{fontspec}
\usepackage{unicode-math}
%
% use http://shapecatcher.com/ to find the char
% or https://w3c.github.io/xml-entities/unicode-names.html

\begin{document}

Is the character ẁ used in some language?

This is a textmode µ and this a math mode one: $𝜇_𝜇$. 
(But be careful, math mu is a different unicode codepoint,
\texttt{1D707}.)

\end{document}

to obtain:

Where I am using the unicode-math package; I found the mathmode µ codepoint at shapcatcher.

Best Answer

Related Solutions

[Tex/LaTex] Problems with Cyrillic fonts in XeTeX

[Tex/LaTex] How to use unicode symbols in TeX source

Related Question