Find if a string exists within external files

expl3external filesstrings

I'm trying to look for a way to determine if a particular text or string is present in external .tex files. It is like accepting a text input from a user and determining if that text can be found in your .tex files. For example, you may want to search for a title of an article, if it is present in your .tex or .bib files. Consider the following MWE.

%MWE
\documentclass{article}
%this is a temporary definition, just to declare the command.
\newcommand*{\mySearchforStringinExternalFilesCommand}[4]{#1#2#3#4}
%first argument is the text/string to be searched.
%second argument is a list of external files that will be searched if the given string is present.
%third argument is the output if the string is found.
%fourth argument is the output if the stsring is not found.

\begin{document}

%I am trying to search for the text ``vibration'' if it is present inside the external files datafileone.tex, datafiletwo.tex, and datafilethree.tex.
%If the text can be found inside the external files, then ``The search phrase ... was found in ... '' will be printed in the pdf file together with the filename/s of the external file/s where the text was found. If not, then ``The search phrase was not found in any of the datafiles.'' will be printed.
%datafileone.tex contains the phrase ``amplitude of vibration''.
%datafiletwo.tex contains the phrase ``frequency of vibration''.
%datafilethree.tex contains the phrase ``instantaneous frequency and instantaneous amplitude''. (these are terms from my thesis :) )
%Therefore, if the macro \mySearchforStringinExternalFilesCommand is designed properly, it must output ``The search phrase `vibration' was found in datafileone.tex and datafiletwo.tex''

\mySearchforStringinExternalFilesCommand%
{vibration}%This is the text or string to be searched in the given external files.
{%These are the external files.
datafileone.tex%
datafiletwo.tex%
datafilethree.tex%
}%
{The search phrase ... was found in ...}%datafileone.tex and/or datafiletwo.tex and/or datafilethree.tex
{The search phrase was not found in any of the datafiles.}%This is printed if the text is not found.

\end{document}

Phelype Oleinik used expl3 syntax and proposed the command \replacelineonce{<file>}{<search string>}{<replacement>}{<true code>}{<false code>}
for find and replace, found in How to replace a line in a file written by TeX' \write command. The code is as follows (taken from the URL)

\documentclass{article}
\usepackage{xparse}
\ExplSyntaxOn
\NewDocumentCommand \replacelineonce { m m m m m }
  { \mountain_replace_once:nnnTF {#1} {#2} {#3} {#4} {#5} }
\NewDocumentCommand \replacelineall { m m m m m }
  { \mountain_replace_all:nnnTF {#1} {#2} {#3} {#4} {#5} }
\tl_new:N \l__mountain_tmpa_tl
\tl_new:N \l__mountain_file_seq
\bool_new:N \l__mountain_replaced_bool
\ior_new:N \l__mountain_replace_ior
\iow_new:N \l__mountain_replace_iow
\prg_new_protected_conditional:Npnn \mountain_replace_once:nnn #1 #2 #3 { T, F, TF }
  { \__mountain_replace_aux:Nnnn \c_false_bool {#1} {#2} {#3} }
\prg_new_protected_conditional:Npnn \mountain_replace_all:nnn #1 #2 #3 { T, F, TF }
  { \__mountain_replace_aux:Nnnn \c_true_bool {#1} {#2} {#3} }
\cs_new_protected:Npn \__mountain_replace_aux:Nnnn #1 #2 #3 #4
  {
    \ior_open:NnTF \l__mountain_replace_ior {#2}
      { \__mountain_replace_line:Nnnn #1 {#3} {#4} {#2} }
      {
        \msg_error:nnn { mountain } { file-not-found } {#2}
        \prg_return_false:
      }
  }
\cs_new_protected:Npn \__mountain_replace_line:Nnnn #1 #2 #3 #4
  {
    \seq_clear:N \l__mountain_file_seq
    \bool_set_false:N \l__mountain_replaced_bool
    \ior_str_map_inline:Nn \l__mountain_replace_ior
      {
        \str_if_eq:nnTF {##1} {#2}
          {
            \bool_set_true:N \l__mountain_replaced_bool
            \seq_put_right:Nn \l__mountain_file_seq {#3}
            \bool_if:NF #1
              { \ior_map_break:n { \__mountain_replace_skip: } }
          }
          { \seq_put_right:Nn \l__mountain_file_seq {##1} }
      }
    \__mountain_replace_end:n {#4}
  }
\cs_new_protected:Npn \__mountain_replace_skip:
  {
    \ior_str_map_inline:Nn \l__mountain_replace_ior
      { \seq_put_right:Nn \l__mountain_file_seq {##1} }
  }
\cs_new_protected:Npn \__mountain_replace_end:n #1
  {
    \ior_close:N \l__mountain_replace_ior
    \iow_open:Nn \l__mountain_replace_iow {#1}
    \seq_map_inline:Nn \l__mountain_file_seq
      { \iow_now:Nn \l__mountain_replace_iow {##1} }
    \iow_close:N \l__mountain_replace_iow
    \bool_if:NTF \l__mountain_replaced_bool
      { \prg_return_true: }
      { \prg_return_false: }
  }
\msg_new:nnn { mountain } { file-not-found }
  { File~`#1'~not~found. }
\ExplSyntaxOff

\begin{document}

\newwrite\tempfile
\immediate\openout\tempfile=lists.tex
\immediate\write\tempfile{line1}
\immediate\write\tempfile{}
\immediate\write\tempfile{line2}
\immediate\write\tempfile{}
\immediate\write\tempfile{line2}
\immediate\write\tempfile{}
\immediate\write\tempfile{line2}
\immediate\closeout\tempfile

\replacelineonce{lists.tex}{line2}{line replaced}
  {Replaced once:}
  {Nothing replaced:}

\input{lists}
\bigskip

\replacelineall{lists.tex}{line2}{line replaced}
  {Replaced all:}
  {Nothing replaced:}

\input{lists}
\bigskip

\replacelineonce{lists.tex}{line2}{line replaced}
  {Replaced once:}
  {Nothing replaced:}

\input{lists}
\bigskip

\end{document}

My interest, though, is only to “find”, and not “find and replace”.

Another similar topic is Find and replace in a document consisting of many 'included' files using \include

Kindly seeking your help.

Best Answer

EDITED to overcome limitations on input characters of certain catcodes. Note in datafileone.tex, the word vibration is part of an argumented definition. In datafiletwo.tex, the word vibration is part of a comment, which is also searched unless you comment out a particular line in the macro definition. The file datafilefour.tex was added to provide a case where search terms are not to be found.

WARNING: In the current incarnation of the readarray package (which will be remedied in a future update), end-of-lines are always discarded during a \readdef and replaced with the value of \readarraysepchar, which is not the natural LaTeX way of reading end-of-lines. This could affect searches where the search string spans multiple lines of input.

Arguments #3 and #4 of \mySearchforStringinExternalFilesCommand are expected to, themselves, take 2 and 1 arguments, respectively, regardless of whether they do anything with them. In the case of #3 the two arguments passed include the search string, and the filename where the match was found. In the case of #4, the argument passed is the search string.

EDITED to demonstrate an OR search, where multiple search strings can be simultaneously specified, with the listofitems OR comparitor ||, as in vibration||frequency.

\begin{filecontents*}[overwrite]{datafileone.tex}
\today \def\mashit#1{\textit{amplitude #1 of vibration}}
\end{filecontents*}
\begin{filecontents*}[overwrite]{datafiletwo.tex}
frequency of something% REMEMBER TO CALL IT vibration
\end{filecontents*}
\begin{filecontents*}[overwrite]{datafilethree.tex}
instantaneous frequency and instantaneous amplitude
\end{filecontents*}
\begin{filecontents*}[overwrite]{datafilefour.tex}
none of the above
\end{filecontents*}

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage{readarray,listofitems}
\readarraysepchar{ }% CURRENT readarray VERSION WILL INSERT THIS
% AUTOMATICALLY AFTER EACH INPUT RECORD IS READ (EVEN IF RECORD ENDS
% ON A MACRO OR `%')
\def\killcats{%
  \catcode`\#=12
  \catcode`\%=12 % COMMENT TO AVOID SEARCH OF COMMENTS
  \catcode`\\=12 
  \catcode`\{=12 
  \catcode`\}=12 }
\def\restorecats{%
  \catcode`\\=0 
  \catcode`\}=2 
  \catcode`\{=1 
  \catcode`\%=14
  \catcode`\#=6 }%

\newcommand*{\mySearchforStringinExternalFilesCommand}[4]{%
  \def\findstatus{F}%
  \setsepchar{,}%
  \readlist*\filelist{#2}%
  \setsepchar{#1}%
  \foreachitem\z\in\filelist[]{%
    \killcats
    \expandafter\readdef\expandafter{\z}\tmpfile
    \restorecats
    \readlist\searchlist{\tmpfile}%
    \ifnum\searchlistlen>1\relax#3{#1}{\z}\def\findstatus{T}\fi
  }
  \if F\findstatus #4{#1}\fi
}
\newcommand\searchtrue[2]{The search phrase ``#1'' was found in #2.\par}
\newcommand\searchfalse[1]{The search phrase ``#1'' was not found in 
  any of the datafiles.\par}
\begin{document}
\mySearchforStringinExternalFilesCommand%
{vibration}
{datafileone.tex, datafiletwo.tex, datafilethree.tex, datafilefour.tex}
{\searchtrue}{\searchfalse}

\bigskip
\mySearchforStringinExternalFilesCommand%
{frequency}
{datafileone.tex, datafiletwo.tex, datafilethree.tex, datafilefour.tex}
{\searchtrue}{\searchfalse}

\bigskip
\mySearchforStringinExternalFilesCommand%
{vibration||frequency}
{datafileone.tex, datafiletwo.tex, datafilethree.tex, datafilefour.tex}
{\searchtrue}{\searchfalse}
\end{document}

enter image description here

Related Question