Is there difference between using ‘~’ and Unicode non-breaking space

tildeunicode

Does (Lua)LaTeX treat these two character the same or is there some difference? I've got used to write Unicode NBSPs using Shift-Space shortcut and it is easier for me to use this instead of the tilde character. Should I change behavior of non-breaking space characters (U+00a0), or is it redundant?

Best Answer

In PDFLaTeX and the latex command on modern distributions, they are the same. Both evaluate to \nobreakspace. In LuaLaTeX and XeLaTeX, they are different by default, but you can change that.

The inputenc package parses the no-break space character (in each encoding that has it) as \nobreakspace. In the Latin-1 encoding, for example, the definition is

\DeclareInputText{160}{\nobreakspace}

And for the default, UTF-8, it is

\DeclareUnicodeCharacter{00A0}{\nobreakspace}

The LaTeX kernel also makes ~ an active character, defined as

\def~{\nobreakspace{}}

In LuaLaTeX or XeLaTeX, ~ still evaluates to \nobreakspace, which is defined in the LaTeX kernel as

\DeclareRobustCommand{\nobreakspace}{%
   \leavevmode\nobreak\ }

However, the character U+00A0 is interpreted literally. (Although it still searches and copies from the PDF as a space character.) You can clearly see the difference with the test file

\documentclass{article}

\begin{document}
foo~bar{^^a0}baz
\end{document}

Latin Modern sample

In particular, U+00A0 is a fixed width set by the font, and \nobreakspace uses the same interword spacing as the rest of the line—so you might want the fixed-width non-breaking space for a monospace font. The no-break space character, ^^a0, \symbol{"A0} and \char"A0 all give the same output.

However, you could redefine U+00A0 to evaluate to \nobreakspace:

\documentclass{article}
\usepackage{fontspec}
\usepackage{newunicodechar}

\newunicodechar{^^a0}{\nobreakspace}

\begin{document}
foo~bar{^^a0}baz
\end{document}

Latin Modern sample