Is there difference between using ‘~’ and Unicode non-breaking space

tildeunicode

Does (Lua)LaTeX treat these two character the same or is there some difference? I've got used to write Unicode NBSPs using Shift-Space shortcut and it is easier for me to use this instead of the tilde character. Should I change behavior of non-breaking space characters (U+00a0), or is it redundant?

Best Answer

In PDFLaTeX and the latex command on modern distributions, they are the same. Both evaluate to \nobreakspace. In LuaLaTeX and XeLaTeX, they are different by default, but you can change that.

The inputenc package parses the no-break space character (in each encoding that has it) as \nobreakspace. In the Latin-1 encoding, for example, the definition is

\DeclareInputText{160}{\nobreakspace}

And for the default, UTF-8, it is

\DeclareUnicodeCharacter{00A0}{\nobreakspace}

The LaTeX kernel also makes ~ an active character, defined as

\def~{\nobreakspace{}}

In LuaLaTeX or XeLaTeX, ~ still evaluates to \nobreakspace, which is defined in the LaTeX kernel as

\DeclareRobustCommand{\nobreakspace}{%
   \leavevmode\nobreak\ }

However, the character U+00A0 is interpreted literally. (Although it still searches and copies from the PDF as a space character.) You can clearly see the difference with the test file

\documentclass{article}

\begin{document}
foo~bar{^^a0}baz
\end{document}

In particular, U+00A0 is a fixed width set by the font, and \nobreakspace uses the same interword spacing as the rest of the line—so you might want the fixed-width non-breaking space for a monospace font. The no-break space character, ^^a0, \symbol{"A0} and \char"A0 all give the same output.

However, you could redefine U+00A0 to evaluate to \nobreakspace:

\documentclass{article}
\usepackage{fontspec}
\usepackage{newunicodechar}

\newunicodechar{^^a0}{\nobreakspace}

\begin{document}
foo~bar{^^a0}baz
\end{document}

Related Solutions

[Tex/LaTex] Replacing Unicode non-breakable spaces by normal spaces

\usepackage{newunicodechar}
\newunicodechar{ }{ }

In the first argument you put a NO-BREAK SPACE (U+00A0), in the second a normal space. A better definition would be

\newunicodechar{ }{~}

(again the space is NO-BREAK SPACE), so this unbreakable space will stretch or shrink wit the other spaces in the line. Of course use the first one if you want a normal space, ça va sans dire. :)

[Tex/LaTex] UTF-8 (but not in current font) character in newcommand

There are a couple of problems:

There is already an action defined for ¦, precisely \IeC{\textbrokenbar}, which is kind of expected; thus \newcommand will give you the error.
If you do
```
\expandafter\newcommand\csname u8:\detokenize{∙}\endcsname{\kern1pt}
```
you're not defining the macro \∙, but a meaning for the Unicode character ∙. Since ∙ is represented in UTF-8 by the triple E2 88 99, TeX will see \^^e2 and the error message uses some representation of the three bytes.

With newunicodechar you don't have to do anything special:

% -*- coding: utf-8 -*-
\documentclass[11pt,english]{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{textcomp}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{graphicx}
\usepackage{babel}
\usepackage{newunicodechar}
\newunicodechar{¦}{\kern20pt} % exaggerated to show the effect
\begin{document}
A¦A
\end{document}

The output is

enter image description here

and the log file will report

Package newunicodechar Warning: Redefining Unicode character on input line 11.

which would be

Package newunicodechar Warning: Redefining Unicode character; it meant
(newunicodechar)                ***  \IeC {\textbrokenbar }  ***
(newunicodechar)                before your redefinition on input line 11.

if the verbose option is used (\usepackage[verbose]{newunicodechar}).

Here's the relevant part from the documentation of newunicodechar.

The package provides only one command, \newunicodechar, which must be called with two arguments:

\newunicodechar{<char>}{<code>}

where <char> is the Unicode character to which we need to give a meaning and <code> is that meaning, that is the LaTeX code that will be substituted to the character.

Best Answer

Related Solutions

[Tex/LaTex] Replacing Unicode non-breakable spaces by normal spaces

[Tex/LaTex] UTF-8 (but not in current font) character in newcommand

Related Question