How to persuade (La)TeX(3) to read a file character by character

expl3external files

I'm doing the Advent of Code (2022) in LaTeX3 (no, really I am). Each challenge involves some input, which for Day 6 consists of a single line which is very long (4096 characters). This line needs processing character by character, so I thought that I would get TeX to read it in character by character. Now, I know that TeX can cope with a token list of that length easily, but my purpose in doing the AoC in LaTeX3 is to better understand the language rather than to find the most efficient or effective solutions. So even though I ended up not using a character-by-character method, and read the whole file into a single token list, I'd still like to know how to do it.

Here's what I tried:

\cs_set_eq:Nc \aoc_input: {@@input} % to get an expandable file input
\cs_new:Npn \aoc_process_char:w #1
{
  \tl_if_eq:nnF {#1} {\par} % this doesn't work
  {
    \aoc_process_char:w
  }
}

\exp_after:wN \aoc_process_char:w \aoc_input: file.txt\relax

This doesn't work because the \par at the end of a file is \outer so can't be used in a parameter of a function. So reading around on this site, I saw the suggestion to \let something to each token in turn and test that, something like:

\def\dostuff{%
  \ifx\tmpvar\par
  \else
    \expandafter
    \slurptoken
  \fi
}
\def\slurptoken{\afterassignment\dostuff\let\tmpvar=}
\expandafter\slurptoken\@@input file.txt\relax

I'm struggling to convert that to LaTeX3 syntax. It feels like one of the \peek_after:Nw type of functions should be what I'm looking for, but I need one that absorbs the token from the stream and the ones with remove in their name just test the charcode or catcode whereas I want to be able to do more processing with that token.

So what is the best LaTeX3 equivalent of the TeX code above?

Best Answer

The easiest, I think, is using

\peek_analysis_map_inline:n { <inline code> }

This function peeks ahead in the input stream, and feeds the tokens that follow to the <inline code>. The <inline code> receives three arguments:

  1. Something that o- or x-expands to the token seen;
  2. The character code of the token (-1 if it's a control sequence);
  3. The category code of the token (0 if it's a control sequence) in hex.

With that information you can likely do any processing you should need on the input.

\peek_analysis_map_inline:n is particularly nice because it goes to some length to differentiate a { from a \bgroup, which is not trivial (not possible with the usual \futurelet approach), and because it makes it very easy to process the token as you need.

Below is an example. It starts with the basics, by checking if the file exists. Then it does \tex_everyeof:D { \__loopspace_file_process_end: } to insert a token that signals the end of the file. Then it inputs the file, and processes it with \peek_analysis_map_inline:n.

\ExplSyntaxOn
\NewDocumentCommand \processfile { m }
  { \loopspace_file_process:n {#1} }
\cs_new_protected:Npn \loopspace_file_process:n #1
  {
    \file_if_exist:nTF {#1}
      { \__loopspace_file_process:n {#1} }
      { \msg_error:nnn { loopspace } { file-not-found } {#1} }
  }
\msg_new:nnn { loopspace } { file-not-found } { File~'#1'~not~found. }
\cs_new_protected:Npn \__loopspace_file_process:n #1
  {
    \group_begin:
      \tex_everyeof:D { \__loopspace_file_process_end: }
      \exp_after:wN \__loopspace_file_process:w \tex_input:D {#1}
    \group_end:
  }
\cs_new_protected:Npn \__loopspace_file_process:w
  {
    \peek_analysis_map_inline:n
      {
        \int_compare:nNnT {##2} = { -1 }
          {
            \exp_after:wN \token_if_eq_meaning:NNT
                ##1 \__loopspace_file_process_end:
              { \peek_analysis_map_break: }
          }
        % process token:
        \mode_leave_vertical:
        \exp_after:wN \token_to_str:N ##1 ~ ( ##2 ,~ ##3 ) \par
        % --------------
      }
  }
\cs_new_protected:Npn \__loopspace_file_process_end:
  { \msg_error:nn { loopspace } { premature-end } }
\msg_new:nnn { loopspace } { premature-break }
  {
    Premature~usage~of~\iow_char:N\\peek_analysis_map_break:.\\
    Some~content~may~have~been~left~over.
   }
\ExplSyntaxOff

\documentclass{article}
\begin{filecontents}{tmp.tex}
Lorem ipsum \textbf{dolor sit amet}. \bgroup{}$&#^_
\end{filecontents}
\begin{document}
\ttfamily \processfile{tmp.tex}
\end{document}
Related Question