UTF-8 issue with Soft Hyphen (U+00AD) in both LuaLaTeX and XeLaTex

unicodeutf8

I have a very simple test case that displays all of the Latin-1 printable characters (U+0021U+007e, U+00A1U+00ff). It works correctly for every UTF-8 character except U+00AD. It fails with both LuaLaTeX and XeLaTex.

It does not matter if I use unicode-math, nor does it matter if I set the monospace font. If I open the source file in Notepad, everything displays correctly, no matter which font I choose (as long as Latin-1 is supported by the font). If I do set a font using setmonofont it doesn't make any difference.

Here is the test case:

\documentclass[10pt]{article}
\begin{document}
\begin{verbatim}
! " # $ % & ' ( ) * + , - . /
0 1 2 3 4 5 6 7 8 9 : ; < = > ?
@ A B C D E F G H I J K L M N O
P Q R S T U V W X Y Z [ \ ] ^ _
` a b c d e f g h i j k l m n o
p q r s t u v w x y z { | } ~

¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ ­ ® ¯
° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿
À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï
Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß
à á â ã ä å æ ç è é ê ë ì í î ï
ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ
\end{verbatim}
\end{document}

This does not show correctly in the question, so an image of the source is also included in a monospace font.

Sample Test Code

And here is the output. It can be seen that on the first line above the ASCII range (starting with the inverted !) that following the "not" symbol, the hyphen is omitted. This may be correct in non-verbatim mode, since it is a "weak" hyphen if it were embedded in a word. However, that processing should not occur in verbatim mode. Note that it also fails when not in verbatim mode, and since the weak hyphen is not embedded in a word, it should not be omitted (as Notepad shows).

Test Output

Best Answer

Whatever Notepad shows should not be taken as an absolute rule. For instance, the editors I have on my machine don't show U+00AD.

enter image description here

Not even if I ask to show invisible characters

enter image description here

However, you might decide to show the character inside verbatim and it's not difficult: hook into the code to add your personal stuff when initiating verbatim.

\documentclass[10pt]{article}
\usepackage{etoolbox}

\makeatletter
\patchcmd{\@verbatim}{\@noligs}{\@noligs\@otherstuff}{}{}
\def\@otherstuff{\catcode"AD=\active}
\begingroup
\catcode"AD=\active
\gdef^^ad{{\ooalign{-\cr\hidewidth?\hidewidth\cr}}}
\endgroup


\begin{document}
\begin{verbatim}
  ! " # $ % & ' ( ) * + , - . /
0 1 2 3 4 5 6 7 8 9 : ; < = > ?
@ A B C D E F G H I J K L M N O
P Q R S T U V W X Y Z [ \ ] ^ _
` a b c d e f g h i j k l m n o
p q r s t u v w x y z { | } ~

  ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ ­ ® ¯
° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿
À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï
Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß
à á â ã ä å æ ç è é ê ë ì í î ï
ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ
\end{verbatim}
\end{document}

enter image description here

Decide about the appearance, here a question mark superimposed to a hyphen.