[Tex/LaTex] Finding all widows and orphans

page-breakingwidows-orphans

There have been several questions about how to avoid widows and orphans, so let me first state that that is not the question here. I know I can try and tweak the settings but that it's still possible to end up with widows and orphans.

What I do want to ask is whether there is some easy way (package) to get a list of all the widows and orphans that are in the document. Written to a separate file or just outputted when the processing of the file is done.

Best Answer

Yes, this is possible if you are prepared to accept a certain level of false positives. Basically for both cases special penalties are used by TeX and those can be recognized in the output routine. So my code below sets the value for \widowpenalty to 151 and \clubpenalty (orphans) to 152 (the LaTeX default is 150). The we use the following code:

\documentclass %[twocolumn]
    {article}

\clubpenalty=152
\widowpenalty=153

% we want to know if we are on first or second column in a 2 column document
\makeatletter
  \def\oncol{\if@twocolumn \space \if@firstcolumn  (first \else (second \fi column)\fi}
\makeatletter

% check if the output penalty was due to orphan or widow or both
\def\testforwidowsandorphans{%
   \ifnum\outputpenalty=153
        \typeout{*** Widow on page  \thepage \oncol}%
  \else
       \ifnum\outputpenalty=152
          \typeout{*** Orphan on page \thepage \oncol}%
      \else
         \ifnum\outputpenalty=305
            \typeout{*** Orphan and Widow on page \thepage \oncol}%
        \fi
      \fi
 \fi
}

% execute this code at the very beginning of the OR
\toks0=\output
\output\expandafter{\expandafter\testforwidowsandorphans
                                   \the\toks0}


\newcommand\stupidpara{First line\\second line\\and final line\par}
\newcommand\verystupidpara{First line\\and final one\par}

\setlength\textheight{5\baselineskip}

\begin{document}
  \stupidpara\stupidpara\stupidpara\stupidpara
   \verystupidpara\verystupidpara\verystupidpara\verystupidpara
\end{document}

Basically we test if the \outputpenaltythat triggered the page is either 152 or 153 or 305, ie the sum of it (which would be the case if a two line paragraph is broken). That will give us the output:

*** Widow on page 1
[1]
*** Orphan on page 2
[2]
*** Orphan and Widow on page 3

and if we typeset the same document in twocolumn mode we get

*** Widow on page 1 (first column)
*** Orphan on page 1 (second column)
[1]
*** Orphan and Widow on page 2 (first column)

You may find that other page breaks produce the same penalties (which then gives you false positives) so chosing the initial values right is essential. Of course you could use 150 and not distinguish between widow and orphan.

Final note: one should probably also add \displaywidowpenalty into the test (the default here is 50 in LaTeX and instead of a simple \typeoutone could think of a more elaborate output, but this is syntactic sugar.

Small update

As remarked by David Carlisle elsewhere it is better not to use 151 (as I did initially for the \clubpenalty) as standard LaTeX uses 151 for \pagebreak[2] so we would get some unnecessary false positives. Of course, if any of such default values are changed the above code would need to change too.

Also worth noting: changing the penalties even by only such small amounts means that the break behavior of your document could get altered, i.e., it may break differently after you added that code. As this version only adds warnings it is therefore best to keep using it all the time and not make the mistake of removing it just before "final" run after having corrected all problems it reported. It may just mean that afterwards you see new breaks --- unlikely but not impossible!.

Now available as a package

A much extended version of the above code is now available on CTAN as the package widows-and-orphans. It automatically calculates the penalty values to make everything unique and detectible and besides widows and orphans it also detects hyphenation across column or page boundaries and math displays that got separated from the preceding text (in case that is allowed in the document).

Next Tugboat will contain an article that discusses various ways to fix such issues. I will also put it up soon at https://latex-project.org/publications

Related Question