[Tex/LaTex] How to break long URLs using common hyphenation but adding a line feed indicator

hyperrefhyphenationline-breakingpunctuationurls

I saw several questions like this one that request a solution for breaking apart URLs.

  1. I wonder if it is possible to hyphenate the words in a URL as it happens in normal text (i.e., breaking apart words with the help of the hyphenation engine without adding hyphens but a special character instead), see example 1.

    • Indicating line feed: To avoid a misleading interpretation of the hyphenated URL, I would like to see a special character such as the carriage return symbol at the hyphenation position.
    • Hyphens vs. dashes: I do not want that the hyphenation engine inserts new hyphens which could be misinterpretated as being part of the URL.
  2. If the hyphenation engine breaks the URL at the position of a dash (the dash is part of the URL), the special character must be inserted whatsoever, see example 2.

  3. If the hyphenation engine breaks the URL at the position of a slash (the slash is part of the URL), the special character must be inserted whatsoever, see example 3.

The desired hyphenation should work in paragraphs, footnotes and in the bibliography.

Examples:

(1) http://www.w3.org/hypertext-transport-protocol/secure/test/appli↩
cationformular.html

(2) http://www.w3.org/hypertext-transport-↩
protocol/secure/test/applicationformular.html

(3) http://www.w3.org/hypertext-transport-protocol/secure/↩
test/applicationformular.html

Related work:

  • The solution posted by Peter Grill introduced a new command. I want to use the existing \url command. Also, his solution breaks words at every character, while I want to rely on the decision of the hyphenation engine using correct hyphenation.

Best Answer

enter image description here

This works for T1 and OT1 encodings, would need modification for other encodings (basically needs some invisible character to use as a fake hyphenation character)

As can be seen if no break is added (first example) no arrow, or arrows will be added if it breaks after the URL syntax / or . (second example) or at a hyphenation point such as exam-ple (third example)

As posted the arrows stick into the right margin, if you prefer them to be within the text block remove the \rlap from the \discretionary.

Also as posted this defines \brkurl the question asks for the command to be called \url just globally delete brk if that is desired.

\documentclass{article}

%\tracingonline1
%\showboxbreadth=200
%\showboxdepth=200

\begin{document}

\def\addurlspace#1{%
\ifx\relax#1%
\else
\ifx/#1\space\fi
\ifx.#1\space\fi
#1%
\ifx/#1\space\fi
\ifx.#1\space\fi
\expandafter\addurlspace
\fi}



\makeatletter

\@namedef{OT1-zwidthchar}{255}
\@namedef{T1-zwidthchar}{"17}

\def\brkurl#1{%
\edef\savedhchar{\the\hyphenchar\font}%
\global\setbox1\hbox{}%
\setbox0=\vbox{\hsize=2pt\rightskip=0pt plus 1fill
\hfuzz\maxdimen
\tracinglostchars0
\overfullrule0pt
\hyphenchar\font=\csname \f@encoding-zwidthchar\endcsname
\noindent \hskip0pt \addurlspace #1\relax
\par
\loop
\setbox4 \lastbox
\ifvoid4 \else
\global\setbox1\hbox{\unhbox4\unskip\unskip\discretionary{\hbox{\rlap{$\leftarrow$}}}{}{}\unhbox1}%
\unskip
\unskip
\unpenalty
\unskip
\repeat
}%
\unhbox1 
\hyphenchar\font\savedhchar
\relax}

\makeatother





 some text \brkurl{http://www.example.com/this/directory/here}
 some text \brkurl{http://www.example.com/this/directory/here} some text 
 some text \brkurl{http://www.example.com/this/directory/here} some text 

\end{document}