I have a document that takes text from source file via pandoc (that doesn't matter anyway) and inserts it instead of $body$. The source document contains the only one symbol that's shown in hex-editor as "C2 AD". As I discovered, it is a UTF-8 character "U+00AD" — soft hyphen. I have tried to handle it with \DeclareUnicodeCharacter command, just this way:
\documentclass[a4paper,10pt]{article}
\usepackage[utf8]{inputenc}
\usepackage[T2A]{fontenc}
\DeclareUnicodeCharacter{00AD}{\-}
\begin{document}
$body$
\end{document}
but Xelatex still returned an error: "! Package inputenc Error: Keyboard character used is undefined (inputenc) in inputencoding `utf8'".
When I tried to use utf8x instead of utf8:
\documentclass[a4paper,10pt]{article}
\usepackage[utf8x]{inputenc}
\usepackage[T2A]{fontenc}
\DeclareUnicodeCharacter{00AD}{\-}
\begin{document}
$body$
\end{document}
it returned: "! LaTeX Error: Missing \begin{document}".
What I am doing wrong?
Best Answer
\usepackage[utf8]{inputenc}
or\usepackage[utf8x]{inputenc}
are needed for TeX engines that do not support UTF-8. Then TeX sees the two bytes C2 and AD andutf8.def
orutf8x.def
make C2 active to catch AD to print the symbol or execute\-
or whatever.In XeTeX, the bytes C2 AD become the "big" character AD. "Big" character means that character with character codes >= 256 are possible. Then you can make the character active and give it the meaning you requested: