[Tex/LaTex] bug with page numbers when using soul package

debuggingpage-numberingsoul

I recently ran into a curious bug when using the soul package. I
managed to boil it down to the following example. The problem is that
the page numbers get messed up. In the example below, they should be
1, 2 and 3. In fact, they are 1, 0 , and 1, in that order. It appears
(in some cases) that adding blank lines inside a \hl environment sometimes
causes this problem, and removing the blank lines fixes the
problem. As one can check, removing the blank line inside the \hl{}
fixes the problem in the example below. It is also possible to fix
the problem in other ways.

Anyway, now that I have discovered this, I can work around the problem
in my paper. However, I'm curious as to what is causing this,
so I thought I might as well post the question. I didn't try asking
the creator ofsoul, which I think is probably unmaintained. The last
update was a long time ago. This is with Debian squeeze, using TeX Live 2009 and soul.sty version 2.4.

I'm using the LaTex function dummytext below to generate dummy data
courtesy of the kind folks on this site, see Generating dummy text
programatically using
TeX/LaTeX

The specific solution I'm using is by Martin
Scharrer
.

\documentclass[10pt]{article}
\usepackage{soul}
\usepackage{pgffor}
\usepackage{setspace}
\doublespacing
\newcommand\dummytext[3]{%
    \foreach \n in {1,...,#3} {%
        \foreach \k in {1,...,#2} {%
            #1%
        }%
        \\
    }%
}
\pagestyle{myheadings}
\begin{document}
\dummytext{HelloWorld}{5}{52}

\dummytext{HelloWorld}{5}{17}
\hl{

x
}
\end{document}

Best Answer

Problem

The problem appears to be the fact that soul uses \count0 as a local scratch register. While it is, as a general rule, safe to use the even \counts locally (see .e.g this answer), \count0 is special because it holds the page number. Using it in this way is fine only if there is zero chance of a page break being triggered within the scope where it is used (as described in the same answer).

This is exactly what you see happening here. While the page break does not occur within the underlined text itself, it is triggered by the paragraph break in the argument of \hl. Until this paragraph break, TeX was still considering putting the first bit of highlighted text on the previous page.

Fix

The problem is fixed by adding the following lines to your preamble.

\makeatletter %% <- make @ usable in command sequences
\newcount\SOUL@minus
\makeatother  %% <- revert @

This reserves a new count register for soul to use instead of the one it is using by default.

To solve the problem at the package level, the line

\countdef\SOUL@minus\z@

in soul.dtx would need to be replaced by (for instance)

\newcount\SOUL@minus

There are a few other registers being used by soul for which the registration of a new register would probably also have been more appropriate. (See e.g. this question.)

I've created an issue on the soul github page, but it appears to be inactive.

Demonstration

The page numbers in the document below are correct with the aforementioned fix in place. If you remove them, however, the page number is reset to 0 just like in your example.

\documentclass{article}
\usepackage{soul}
\usepackage{blindtext}

\makeatletter
\newcount\SOUL@minus %% <- without this line the page number would be reset to 0
\makeatother

\begin{document}

\Blindtext[4]

\blindtext
\hl{Closing words

New paragraph!}

\end{document}

This is the bottom of the first page of the output:

output

This is what it would've been without this addition to the preamble:

what could've been