[Tex/LaTex] Problem with defining new unicode character

errorsunicodexetex

I have a document that takes text from source file via pandoc (that doesn't matter anyway) and inserts it instead of $body$. The source document contains the only one symbol that's shown in hex-editor as "C2 AD". As I discovered, it is a UTF-8 character "U+00AD" — soft hyphen. I have tried to handle it with \DeclareUnicodeCharacter command, just this way:

\documentclass[a4paper,10pt]{article}
\usepackage[utf8]{inputenc}
\usepackage[T2A]{fontenc}
\DeclareUnicodeCharacter{00AD}{\-}

\begin{document}
$body$
\end{document}

but Xelatex still returned an error: "! Package inputenc Error: Keyboard character used is undefined (inputenc) in inputencoding `utf8'".

When I tried to use utf8x instead of utf8:

\documentclass[a4paper,10pt]{article}
\usepackage[utf8x]{inputenc}
\usepackage[T2A]{fontenc}
\DeclareUnicodeCharacter{00AD}{\-}

\begin{document}
$body$
\end{document}

it returned: "! LaTeX Error: Missing \begin{document}".

What I am doing wrong?

Best Answer

\usepackage[utf8]{inputenc} or \usepackage[utf8x]{inputenc} are needed for TeX engines that do not support UTF-8. Then TeX sees the two bytes C2 and AD and utf8.def or utf8x.def make C2 active to catch AD to print the symbol or execute \- or whatever.

In XeTeX, the bytes C2 AD become the "big" character AD. "Big" character means that character with character codes >= 256 are possible. Then you can make the character active and give it the meaning you requested:

% XeTeX or LuaTeX
\catcode`\^^ad=\active
\let^^ad=\-