[Tex/LaTex] Avoiding a page break after the first word of a sentence

page-breaking

Is there a way to tell TeX to avoid breaking the page after the first word of a sentence?



He was quite dead. Apparently his neck had been
broken. The lightning flashed for a third time, and
his face leaped upon me. I sprang to my feet. It
(text continues on next page)

And then you have to turn the page for the rest of the sentence. It's not in a line on it's own so it can't be penalized like an orphan line.

Can TeX be told, to resolve these by, say, breaking the page before that first word?

Note:

Although the answers given below are very informative, the general consensus had been that the best practice is to leave this for the proofreaders to spot, and then fix manually.

Best Answer

EDIT: I forgot to mention that although this whole answer works in simple cases, it is a bad idea to rely on it for anything serious, since it can break in many different ways. Typically, catcode changes are a bad idea...

EDIT: Lev Bishop pointed out that inserting \nopagebreak after each first word of a sentence is too much, because it will forbid line breaks after each line containing the first word of a sentence. Here, I fixed this problem by using the auxiliary file, and checking the page number on both sides of the space following the first word of the sentence.

It is also possible to make ., !, ? active, let them read the next word and place \nopagebreak after the first word of each sentence (except the first one of a paragraph).

Thing are more complicated if we still want to use . in dimensions (e.g., width=3.4cm in \includegraphics). Also, the last punctuation of a paragraph needs special treatment (in particular when the paragraph does not quite finish with that punctuation (e.g. quotes)...).

Hopefully, the code below works. Currently, I've inserted * after \nopagebreak, just to visualize the places where a \nopagebreak is inserted. Of course, remove it.

\documentclass[a5paper]{article}


\makeatletter
% \begin{macro}
% The code below inserts "\eos@text" each time a space following the
% first word of the sentence falls on the separation between two pages.
%    \begin{macrocode}
\newcommand{\eos@text}{\nopagebreak[4]*}
%    \end{macrocode}
% \end{macro}
% 
% \begin{macro}{\eos@active,\eos@active@text}
%   
%   "#1" is the character (".", "!", "?") that ended the sentence.
%   We distinguih various cases depending on the following non-space 
%   character, "#2". In every case, we start by putting the
%   punctuation "#1" back.
%   
%   If "#2" is a digit, we assume that we are in the middle of a
%   number such as "width=5.3em" in, say, "\includegraphics".
%   (This is only relevant for ".", though.)
%   
%   If "#2" is "\par", that means that the punctuation is the last 
%   one in the paragraph, so we can safely do nothing.
%   
%   If "#2" is a quote, we need to treat things differently. (Here
%   we actually pretend that the quote is in fact the end-of-sentence.)
%   
%   Finally, in every other case, we grab the first word and place
%   a non-breakable space afterwards.
%   
%   In each case, we put back what directly followed the punctuation
%   right after our test.
% 
%   \begin{macrocode}
\newcommand{\eos@active}[2]{%
  #1%
  \ifnum9<1#2\space 
  \else
    \ifx\par#2%
    \else
      \ifx'#2%
        \expandafter\expandafter\expandafter\expandafter
        \expandafter\expandafter\expandafter\eos@active
      \else
        \expandafter\expandafter\expandafter\expandafter
        \expandafter\expandafter\expandafter\eos@active@text
      \fi
    \fi
  \fi
  #2%
}
%    \end{macrocode}
%    Grabbing the following word: the first "\newcommand" checks that
%    the command is not already defined. Then we define it through "\def"
%    because its argument is a bit more complicated than usual, delimited
%    by a space. Also to note is the initial space (before "#1"): that 
%    was lost in our test, and we put it back.
%    
%    Earlier, we were putting a "\nopagebreak" after that first word,
%    but now, we do something more tricky, only putting a "\nopagebreak"
%    if at the previous run of LaTeX there was a page break there.
%    
%    \begin{macrocode}
\newcommand{\eos@active@text}{}
\def\eos@active@text#1 { #1\eos@space}
%    \end{macrocode}
% \end{macro}

% \begin{macro}{\eos@space}
%   As Lev Bishop mentions, putting "\nopagebreak" forbids a page break
%   after the current line. So we don't want to always insert a page 
%   break! The test \emph{is} crazy\ldots Too lazy to explain the 
%   details. "\count0" is the page number, "\write" rather than 
%   "\immediate\write" in order to get the page number when typeset
%   rather than when read. "\csname eos@mark@\the\count0\endcsname"
%   creates a control sequence (equal to relax) corresponding to the
%   page number. And the test "\eos@pagetest" checks whether the
%   control sequence corresponding to the page \emph{after} the space
%   is already defined. If it is, we write something to the aux file.
%   
%   
%   \begin{macrocode}
\newcount\eos@current
\newcount\eos@pageno
\newcommand{\eos@space}{%
  \advance\eos@current by\@ne
  \write\@mainaux{\relax
    \expandafter\@gobble\csname eos@mark@\the\count0\endcsname}%
  \csname eos@\romannumeral\eos@current\endcsname
  \space
  \write\expandafter\@mainaux\expandafter{%
    \expandafter\eos@pagetest\expandafter{\romannumeral\eos@current}\relax}%
}
%   \end{macrocode}
% \end{macro}
% 
% \begin{macro}{\eos@pagetest}
%   If the page number is a brand new page number (i.e. if 
%   "\csname eos@mark@\the\count0\endcsname" is not yet defined),
%   we write something to the aux file. Otherwise, we don't do anything.
%    \begin{macrocode}
\newcommand{\eos@pagetest}[1]{%
  \unless\ifcsname eos@mark@\the\count0\endcsname
  \noexpand\eos@rewrite{\gdef\csname eos@#1\endcsname{\noexpand\eos@text}}%
  \fi
}%
\newcommand{\eos@rewrite}[1]{#1%
  \ifx\usepackage\documentclass
  \expandafter\@gobble
  \else
  \expandafter\AtBeginDocument
  \fi
  {\immediate\write\@mainaux{\unexpanded{\eos@rewrite{#1}}}}%
}
%    \end{macrocode}
%    "\eos@rewrite" is meant for use in the aux file, and 
%    rewrites itself to the aux file. The test is very bad, 
%    checks whether we are reading the aux file at the start
%    or the end of the document (any better test?).
%
%    If we didn't rewrite, a space that changes page would lead
%    to inserting "\nopagebreak[4]", but at the next run that would
%    prevent a page break. Then the space would no longer be at the
%    change of a page. So it would not insert "\nopagebreak[4]" for
%    the next run. Thus, in the next run, the space would (probably)
%    be at the change of pages again, etc. 
%    
%    So we make that "\nopagebreak" resilient. If you need to reset 
%    all of this, just delete the .aux file.
% \end{macro}
%  

% \begin{macro}{\activate@eos}
%     It's better to make ".", "!", "?" at "\begin{document}".
%     For that we define "\activate@eos" which makes its 
%     argument active, and defines it to be an end-of-sentence ("eos").
%     \begin{macrocode}
\newcommand{\activate@eos}[1]{%
  \begingroup
  \lccode`\~`#1\space
  \lowercase{%
    \endgroup
    \catcode`#1=13\relax
    \newcommand{~}{\eos@active{#1}}%
  }%
}
\AtBeginDocument{%
  \activate@eos{.}%
  \activate@eos{!}%
  \activate@eos{?}%
}
%     \end{macrocode}
%   Am I missing a possible end-of-sentence marker?
% \end{document}
\makeatother


% ==========================================================
% Just for demonstration
\usepackage[text={5cm,36pt}]{geometry}


\begin{document}

% We repeat the text until it fills 10 pages.
\loop\ifnum\count0<10\relax 
Greetings. He will. Will he? No, he won't. Maybe not, anyways. Although, perhaps. And that changes. Constantly. Is it worth? It really is not. Short sentences, why? To test better. Make sure it works!

I'm lazy. So copy. And paste. Repeating the same. Many times. Of course! Just a bit more.

\repeat

\end{document}
Related Question