[Tex/LaTex] Search and replace in a verbatim token list

substitutiontoken-listsverbatim

I maintain a package for the R programming language that contains a vignette which gets processed using the Sweave literate programming system. Now, one of the irks about Sweave is that it always produces output wrapped in two layers of environments like the following:

\begin{Schunk}
\begin{Sinput}
> Some R command
\end{Sinput}
\begin{Soutput}
The result
\end{Soutput}
\end{Schunk}

Now, I would like to collapse this into something that only uses a single environment by tossing out the \begin, \end commands. Or, optionally replacing them with delimiters that the listings package can hook into with a moredelim definition:

\begin{Schunk}
> Some R command
swe@veSt@rtOutput
The result
swe@veEndOutput
\end{Schunk}

I am currently achieving this effect through a hack that re-writes the R output driver when Sweave loads the .Rnw file. However, this is proving to be a maintenance burden as the structure of the driver changes with every R release and I have to update the hack. So, I am looking for a way to attack this from the TeX end.

Here is a minimal example of some Sweave output along with my current method of wrapping it into a listing:

\documentclass{minimal}
\usepackage{verbatim}
\usepackage{listings}

\newwrite\listinginput

\makeatletter
\def\startcapture{%
  \begingroup
  \@bsphack
    \immediate\openout\listinginput=\jobname.lst%
    \let\do\@makeother\dospecials\catcode`\^^M\active
    \def\verbatim@processline{%
        % The macro now pipes it's input to our temp file.
        \immediate\write\listinginput{\the\verbatim@line}
    }%
    \verbatim@start
}
\def\stopcapture{%
    \immediate\closeout\listinginput
    \@esphack
  \endgroup
}
\makeatother

\newenvironment{Schunk}{\startcapture}%
  {\stopcapture\lstinputlisting{\jobname.lst}}

\begin{document}

\begin{Schunk}
\begin{Sinput}
> getLatexStrWidth("The symbol: alpha")
\end{Sinput}
\begin{Soutput}
[1] 82.5354
\end{Soutput}
\end{Schunk}

\end{document}

In my full version the Schunk environment is also wrapping the whole show into a TikZ node, which is why I am using verbatim to write things to a temp file and then read them back in using lstinputlisting (listing doesn't take kindly to being embedded in another environment).

Now, in verbatim@processline, I have an opportunity to examine things line by line and so I am wondering if there is a way to perform the equivalent of:

s/\\begin{Sinput}/customDelimiter/

On the contents of the token list stored in verbatim@line. The closest I have gotten is by using the ted package:

\def\verbatim@processline{%
  \Substitute*{\the\verbatim@line}{Sinput}{swe@veSt@rtOutput}
  % The macro now pipes it's input to our temp file.
  \immediate\write\listinginput{\the\ted@toks}
}%

This gives me the following output:

\begin{swe@veSt@rtOutput}
> getLatexStrWidth("The symbol: alpha")
\end{swe@veSt@rtOutput}
\begin{Soutput}
[1] 82.5354
\end{Soutput}

However, I am unable to get the following to work:

\Substitute*{\the\verbatim@line}{\begin{Sinput}}{swe@veSt@rtOutput}

Any pointers on how to achieve this act of sourcery would be greatly appreciated!


Update

Aaron's comment led me to discover how to properly escape things like \begin{Sinput} so that listings will treat them as delimiters:

\lstdefinestyle{sweave}{
  moredelim=[is][]
    {\\begin\{Sinput\}}
    {\\end\{Sinput\}},
  moredelim=[is][]
    {\\begin\{Soutput\}}
    {\\end\{Soutput\}}
}

When used with \lstinputlisting[style=sweave], this gives the following output

. 
> getLatexStrWidth("The symbol: alpha")
.
.
[1] 82.5354
.

The . characters aren't actually in the output, they are just there to show that empty lines get left behind.

I think I can work with this result, but I am leaving the question open in case anyone has an answer to the general question of "How do I s/\\begin{this}/that/ in a token list??".


Update 2

I have been experiencing difficulty with using pattern matching macros inside \verbatim@processline. I suspect this is because that macro is executed when the verbatim environment is active and catcodes are all shifted. So any pattern specified when \verbatim@processline is defined won't have the proper catcodes to generate a match. I've stared at the verbatim documentation long enough to get a headache, and I still can't figure out how to set things up correctly.

The workaround I have come up with is to expand my original example to a three-step process by writing the content captured by verbatim to a temporary file. This output is then re-read line-by-line outside of the verbatim environment where catcodes are in their normal state and pattern matching works. The result of this processing step is then written to a final file that lstinputlisting reads in and processes.

The code now looks like this:

\documentclass{minimal}
\usepackage{verbatim}
\usepackage{listings}
\usepackage{xcolor}
\usepackage{xstring}

\newwrite\listinginput
\newread\tempin

\newif\ifskipline
\newif\ifhaveoutput
\newtoks\linebuffer
\def\addtobuffer#1{%
  \toks0={#1}%
  \edef\act{\noexpand\linebuffer={\the\linebuffer \the\toks0}}%
  \act}
\def\addoutput#1{%
  \expandafter\addtobuffer\expandafter{#1}%
  \haveoutputtrue}
\newtoks\parpattern\parpattern={\par}

\makeatletter
\def\startcapture{%
  \begingroup
  \@bsphack
    \immediate\openout\listinginput=\jobname.tmp%
    \let\do\@makeother\dospecials\catcode`\^^M\active
    \def\verbatim@processline{%
        % The macro now pipes it's input to our temp file.
        \immediate\write\listinginput{\the\verbatim@line}
    }%
    \verbatim@start
}
\def\stopcapture{%
    \immediate\closeout\listinginput
    \@esphack
  \endgroup
}
\makeatother

\def\replaceenvs#1{%
  \immediate\openin\tempin=\jobname.tmp
  \immediate\openout\listinginput=\jobname.lst
  \def\prependcode{}
  \begingroup
    % Xstring doesn't like parameter and comment characters
    \catcode`\#=11
    \catcode`\%=11
    \loop
      \ifeof\tempin
        \immediate\write\listinginput{\the\linebuffer}
      \else
        \immediate\read\tempin to \codeline
        \IfBeginWith{\expandafter\string\codeline}{\string\begin{Sinput}}{\addtobuffer{Swe@veBeginInput}\skiplinetrue}{}
        \IfBeginWith{\expandafter\string\codeline}{\string\end{Sinput}}{\addtobuffer{Swe@veEndInput}\skiplinetrue}{}
        \IfBeginWith{\expandafter\string\codeline}{\string\begin{Soutput}}{\addtobuffer{Swe@veBeginOutput}\skiplinetrue}{}
        \IfBeginWith{\expandafter\string\codeline}{\string\end{Soutput}}{\addtobuffer{Swe@veEndOutput}\skiplinetrue}{}
        \IfBeginWith{\expandafter\string\codeline}{\expandafter\string\the\parpattern}{\skiplinetrue}{}
        \ifskipline
          \skiplinefalse
        \else
          \ifhaveoutput
            \immediate\write\listinginput{\the\linebuffer}
            \linebuffer={}\haveoutputfalse
          \fi
          \addoutput{\codeline}
        \fi
    \repeat   
  \endgroup

  \immediate\closein\tempin
  \immediate\closeout\listinginput
}

\lstdefinestyle{sweave}{
  moredelim=[is][\color{red}]
    {Swe@veBeginOutput}
    {Swe@veEndOutput},
  moredelim=[is][\color{blue}]
    {Swe@veBeginInput}
    {Swe@veEndInput}
}

\newenvironment{Schunk}{%
  \startcapture%
}{%
  \stopcapture%
  \replaceenvs{\jobname.tmp}%
  \lstinputlisting[style=sweave]{\jobname.lst}%
}

\begin{document}

\begin{Schunk}
\begin{Sinput}
# This is a comment
> getLatexStrWidth("The symbol: alpha")
\end{Sinput}
\begin{Soutput}
[1] 82.5354%
\end{Soutput}
\end{Schunk}

\end{document}

verbatim works it's magic as before between the \startcapture and \endcapture macros to produce a \jobname.tmp file with the following contents:

\begin{Sinput}
# This is a comment
> getLatexStrWidth("The symbol: alpha")
\end{Sinput}
\begin{Soutput}
[1] 82.5354%
\end{Soutput}

This is reprocessed by the new code in \replaceenvs using the xstring package to create \jobname.lst:

Swe@veBeginInput# This is a comment 
> getLatexStrWidth("The symbol: alpha") Swe@veEndInputSwe@veBeginOutput
[1] 82.5354% Swe@veEndOutput

Listings then processes \jobname.lst to produce:

# This is a comment 
> getLatexStrWidth("The symbol: alpha")
[1] 82.5354%

Where the first two lines are highlighted in red and the last line is in blue. This successfully removes the Sinput and Soutput environments, replaces them with delimiters that listings can use to style the code, and leaves no blank lines behind.

However, I really feel like I took the long way around with reading and writing two files. So, I'm leaving this question open and offering a bounty to anyone that can come up with a solution that uses fewer passes.


Final Thoughts

In the end, the problems I had with pattern matching inside \verbatim@processline were due to the fact that I had to supply a pattern that would match both the characters and the catcodes of the strings I was looking for.

For those people like me who are somewhere in the transition zone between TeX users and TeX programmers, here are the details: Inside a verbatim environment, the catcodes of many characters are reassigned to 12. This is done by the combination of \let\do\@makeother and \dospecials. Staring at the sections of package manuals that documents TeX code gives me a headache, so I finally started using TeX compiler in interactive mode (something I have always ignored until now) along with \show to unravel macro definitions:

grendel:~ sharpie$ pdflatex
This is pdfTeX, Version 3.1415926-1.40.11 (TeX Live 2010)
**\documentclass{minimal}
*\usepackage{verbatim}
*\makeatletter

*\show\@makeother
> \@makeother=macro:
#1->\catcode `#112\relax .

*\show\dospecials
> \dospecials=macro:
->\do \ \do \\\do \{\do \}\do \$\do \&\do \#\do \^\do \_\do \%\do \~.

\let\do\@makeother defines \do to be a macro that changes the catcode of its argument to 12. \dospecials applies \do to a list of special characters: ,\,{,}, etc. So, if you want to match against a string that contains one of those characters inside a verbatim environment, you have to define the your match pattern in an environment where those characters have a catcode of 12.

Leo Liu and unbonpetit win the prize for pointing out how to do this. It was a tough decision to decide who gets the bounty—I wish I could split it in half. In the end, I'm giving it to ubonpetit because they went the extra mile in their example. However, Leo Liu did answer first with information that led me to a better implementation so I am going to go through his answer list and vote up some of his excellent contributions.

The final version that I ended up with is available as a package on GitHub:

http://github.com/Sharpie/SweaveToLst

Thanks to everyone who answered!

Best Answer

My turn to try something:

\documentclass{minimal}
\usepackage{verbatim}
\usepackage{listings}
\usepackage{ted}
\newwrite\listinginput

\makeatletter

\def\newsubstitution{\begingroup\let\do\@makeother\dospecials\newsubstitution@}
\def\newsubstitution@||#1||<->||#2||{%
    \endgroup
    \expandafter\def\expandafter\subst@list\expandafter{\subst@list\Substitute*[\verbatim@line]{\the\verbatim@line}{#1}{#2}}}
\def\clearsubstlist{\let\subst@list\@empty}
\clearsubstlist

\newenvironment{Schunk}
    {\begingroup
    \@bsphack
    \immediate\openout\listinginput=\jobname.lst%
    \let\do\@makeother\dospecials\catcode`\^^M\active
    \def\verbatim@processline{%
        \subst@list
        \immediate\write\listinginput{\the\verbatim@line}}%
    \verbatim@start}%
    {\immediate\closeout\listinginput
    \@esphack
    \endgroup
    \lstinputlisting{\jobname.lst}}

\makeatother
\begin{document}
\newsubstitution||\begin{Sinput}||<->||Swe@veBeginInput||
\newsubstitution||\end{Sinput}||<->||Swe@veEndInput||
\newsubstitution||\begin{Soutput}||<->||Swe@veBeginOutput||
\newsubstitution||\end{Soutput}||<->||Swe@veEndOutput||
With a substitution list:
\begin{Schunk}
\begin{Sinput}
> getLatexStrWidth("The symbol: alpha")
\end{Sinput}
\begin{Soutput}
[1] 82.5354
\end{Soutput}
\end{Schunk}

\clearsubstlist
With no substitution list:
\begin{Schunk}
\begin{Sinput}
> getLatexStrWidth("The symbol: alpha")
\end{Sinput}
\begin{Soutput}
[1] 82.5354
\end{Soutput}
\end{Schunk}
\end{document}
Related Question