Why does LaTeX tell me that the external text file has a last line which contains the token \par

external filestex-core

Why does LaTeX (TeX Live 2020, Debian) tell me that SomeLines.txt has a 6th line which contains the token \par?

I know with \endlinechar being 13 and the catcode of char 13 (carriage return) being 5 you get \par for an empty line because a carriage-return character is appended to the empty line and that carriage-return character is processed while the carriage-return character is of category 5(end of line) and TeX's reading-apparatus is in state N.

So I suppose TeX's routine for reading a line of input "thinks" there is an empty line/an empty record after Line 5 and appends the endline-character, whose category is 5(end of line). When processing that character, TeX's reading-apparatus is in state N leading to the appended endline-character yielding the control-word-token \par.

If so:

  • Why does TeX think that there is an empty line/an empty record after Line 5?
  • Do the routines for reading a line of input of all TeX distributions think so?
\begin{filecontents*}{SomeLines.txt}
Line1
Line2
Line3
Line4
Line5
\end{filecontents*}

\newread\SomeLinesRead
\newcount\SomeLinesCount
\def\CountLinesLoop{%
  \ifeof\SomeLinesRead
  \else
     \immediate\read\SomeLinesRead to \Thisline
     \global\advance\SomeLinesCount by 1 %
     \message{^^JLine \number\SomeLinesCount="\Thisline"}%
  \expandafter\CountLinesLoop\fi
}
\immediate\openin\SomeLinesRead SomeLines.txt
\SomeLinesCount=0
\begingroup%
% Changes of the value of \endlinechar are reflected in the line "Line6=...".
% If \endlinechar denotes the number of the code-point of a
% character of category 5(end of line), the state of the reading-
% apparatus is reflected, too, which apparently is N when the
% character appended to line/record 6 is processed.
%\endlinechar=`\A  % -> Line 6="A" 
%\endlinechar=`\B  % -> Line 6="B" 
\CountLinesLoop%
\endgroup%
\immediate\closein\SomeLinesRead
\message{^^JTotal amount of lines: \number\SomeLinesCount}%
\stop

Console output:

This is pdfTeX, Version 3.14159265-2.6-1.40.21 (TeX Live 2020) (preloaded format=pdflatex)
 restricted \write18 enabled.
entering extended mode
(./test.tex
LaTeX2e <2020-10-01> patch level 4
L3 programming layer <2021-02-18>

LaTeX Warning: Writing file `./SomeLines.txt'.



Line 1="Line1 " 
Line 2="Line2 " 
Line 3="Line3 " 
Line 4="Line4 " 
Line 5="Line5 " 
Line 6="\par " 
Total amount of lines: 6 )
No pages of output.
Transcript written on test.log.

I assume:

The condition for terminating the loop is not "if the last record-terminator of the file is reached".

The condition for terminating the loop is \ifeof, i.e., "if the end of the file is reached".

There are five lines/records (none of them containing unmatched characters of category 1(begin group) so that in this scenario a single \read-operation processes a single line/record) and thus five "record-terminators" and one "end of file" belonging to SomeLines.txt, which makes a total of six reading-iterations until the end of the file is reached. In the 6th iteration the "line/record" is considered empty because there is nothing between the last "record-terminator" and the "end of file".
TeX's handling of empty lines and the coming into being of the token \par is described above.

Is this interpretation correct?


Addendum in November 3, 2021:

In his answer David Carlisle pointed out:

tex.web 9507 says

@ An empty line is appended at the end of a |read_file|. @^empty
line at end of file@> 

Seems this means an empty line/record is appended.
Seems this is done on purpose.

Thus subsequent questions are:

Why? What purpose?

Did I overlook the documentation of this "feature" in the TeXbook?

Is this one of the many features for which you need to study tex.web to become aware?


Addendum in January 20, 2022:

The TeXbook says in Chapter 20: Definitions (also called Macros):

To get input from an open file, you say
\read⟨number⟩to⟨control sequence⟩
and the control sequence is defined to be a parameterless macro whose replacement text is the contents of the next line read from the designated file. This line is converted to a token list, using the procedure of Chapter 8, based on the current category codes.
Additional lines are read, if necessary, until an equal number of left and right braces has been found. An empty line is implicitly appended to the end of a file that is being \read.


Addendum in December 2, 2023:

The questioner (Ulrich Diez) supposes that the empty line is implicitly appended for the following reason:

With interesting file systems/operating systems detecting emptiness of a file requires an attempt at reading from this file and "seeing" whether this attempt immediately yields reaching the end of the file.

Thus the result of attempting to perform a \read-assignment needs to be determined for the case of the file being empty.

Implicitly appending an empty line determines the result of attempting to perform a \read-assignment for the case of the file being empty.

Thus with such interesting file systems/operating systems implicitly appending an empty line makes it possible to perform a \read-assignment for adjusting the \ifeof-switch also in case the file in question is empty.

So as a rule of thumb first have TeX perform a \read-assignment.
Then have TeX perform an \ifeof-test.

In case the \ifeof-test is false the replacement text of the macro defined in the course of the \read-assignment consists of material coming from the file.

In case the \ifeof-check is true, the replacement text of the macro defined in the course of the \read-assignment consists of material not coming from the file but coming from what TeX makes of the implicitly appended empty line.

The implicit appending of an empty line also makes it possible to check whether a file exists:

In case the file does exist,\ifeof is not true immediately right after opening the file for reading but is true only after performing that \read-assignment which reads the implicitly appended empty line.

In case the file does not exist, \ifeof is true right after opening the file for reading without performing \read-assignments.

\begin{filecontents*}{SomeLines.txt}
Line1
Line2
Line3
Line4
Line5
\end{filecontents*}

\begin{filecontents*}{SomeLinesB.txt}
\end{filecontents*}

\newread\SomeLinesRead
\newcount\SomeLinesCount
\def\CountLinesLoop{%
  % In the course of reading what TeX makes of the implicitly appended
  % empty line TeX also adjusts the \ifeof-switch:
  \immediate\read\SomeLinesRead to \Thisline
  \ifeof\SomeLinesRead
    \message{^^JLine implicitly appended by TeX="\Thisline"}%
  \else
    % Only advance \SomeLinesCount in case the line just read is not the
    % implicitly appended empty after which \ifeof yields the true-branch.
    \global\advance\SomeLinesCount by 1 %
    \message{^^JLine \number\SomeLinesCount="\Thisline"}%
    \expandafter\CountLinesLoop
  \fi
}

\message{^^J}%
\message{^^J}%
\message{^^JProcesing file SomeLines.txt}%
\message{^^J============================}%
\message{^^J}%
\immediate\openin\SomeLinesRead SomeLines.txt
\message{^^JFile SomeLines.txt does \ifeof\SomeLinesRead not \fi exist}%
\message{^^J}%
\SomeLinesCount=0
\begingroup
%\endlinechar=`\A  % -> Line implicitly appended by TeX="A" 
%\endlinechar=`\B  % -> Line implicitly appended by TeX="B" 
\CountLinesLoop%
\endgroup%
\immediate\closein\SomeLinesRead
\message{%
  ^^J%
  \ifnum0=\SomeLinesCount 
     The file is empty.
  \else
    Total amount of lines of the file is: \number\SomeLinesCount.
  \fi
}%

\message{^^J}%
\message{^^J}%
\message{^^JProcesing file SomeLinesB.txt}%
\message{^^J=============================}%
\message{^^J}%
\immediate\openin\SomeLinesRead SomeLinesB.txt
\message{^^JFile SomeLinesB.txt does \ifeof\SomeLinesRead not \fi exist}%
\message{^^J}%
\SomeLinesCount=0
\begingroup
%\endlinechar=`\A  % -> Line implicitly appended by TeX="A" 
%\endlinechar=`\B  % -> Line implicitly appended by TeX="B" 
\CountLinesLoop%
\endgroup%
\immediate\closein\SomeLinesRead
\message{%
  ^^J%
  \ifnum0=\SomeLinesCount 
     The file is empty.
  \else
    Total amount of lines of the file is: \number\SomeLinesCount.
  \fi
}%

\message{^^J}%
\message{^^J}%
\message{^^JProcesing non-existent file SomeLinesC.txt}%
\message{^^J==========================================}%
\message{^^J}%
\immediate\openin\SomeLinesRead SomeLinesC.txt
\message{^^JFile SomeLinesC.txt does \ifeof\SomeLinesRead not \fi exist}%
\immediate\closein\SomeLinesRead

\stop

Console output:

This is pdfTeX, Version 3.14159265-2.6-1.40.21 (TeX Live 2020) (preloaded format=pdflatex)
 restricted \write18 enabled.
entering extended mode
(./test.tex
LaTeX2e <2020-10-01> patch level 4
L3 programming layer <2021-02-18>

LaTeX Warning: Writing file `./SomeLines.txt'.



LaTeX Warning: Writing file `./SomeLinesB.txt'.





Procesing file SomeLines.txt 
============================ 

File SomeLines.txt does exist 

Line 1="Line1 " 
Line 2="Line2 " 
Line 3="Line3 " 
Line 4="Line4 " 
Line 5="Line5 " 
Line implicitly appended by TeX="\par "

Total amount of lines of the file is: 5.  


Procesing file SomeLinesB.txt 
============================= 

File SomeLinesB.txt does exist 

Line implicitly appended by TeX="\par " 
The file is empty.  


Procesing non-existent file SomeLinesC.txt

========================================== 

File SomeLinesC.txt does not exist )
No pages of output.
Transcript written on test.log.

Best Answer

When reading files tex acts essentially the same way whether or not there is an end of line character at the end of the file. The final line is taken as a line and \endlinechar is added to the end of each line if this is set to a legal character value.

tex.web 9507 says

@ An empty line is appended at the end of a |read_file|.
@^empty line at end of file@>

so if \endlinechar is the normal 13 and character 13 has catcode 5 then as usual that ends up being reported as \par. If you set \endlinechar=-1 then the final line before the \ifeof is true will be reported as empty.

Related Question