MATLAB: How to extract text from string at the same location, one line above

MATLABperformancesearchspeedstrfindstring

I have a variable number of text files (between 3-8), each between 20,000 and 30,000 lines long (different lengths), and around 400 words to search for. The words have different lengths.
Let's say I have the following text:
xxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxx999xxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxx12345xxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
where xxxxx can be anything other than what I want to search for. I want to make check whether the following is true:
  1. That each text file includes '12345'
  2. That for at least one occurrence of '12345' in each file, there is '999'. The end of '999' always coincides with the end of '12345'.
I can determine whether '12345' is in each of the text files using strfind, but strfind only ouputs an "index" value for the first character of my search pattern (e.g. 613587). Is there a way to find the line number that "index" value corresponds with, and search one line above for '999'?
I think I saw people recommending that each line for each file be read as a separate string, then search each string independently, but that seems like a lot of work for MATLAB to go through, having to generate close to a hundred thousand strings. Is there a better/more efficient way of achieving this?
Any help would be appreciated!

Best Answer

"Is there a better/more efficient way of achieving this?" No, I don't think so. However, speed depends on how "each line for each file be read as a separate string" is done. (Are strings in an array separate?)
"that seems like a lot of work for MATLAB" Don't guess and don't rely on hearsay. Make a simple test.
I assume that your example is oversimplefied and that the script below won't work with the actual files. However, it might help you to estimate execution times.
I made a test file, cssm.txt, with 30,000 lines by copying and modifying lines from your question. It contains only one pair
xxxxxxxxxxxxxxxxxxxxxxxxxx999xxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxx12345xxx
which is at line 15001.
The script below contains two independent solutions and strfind(chr,'12345') for comparison. The elapse times for the three cases are
Elapsed time is 0.006313 seconds.
Elapsed time is 0.053587 seconds.
Elapsed time is 0.020818 seconds.
on a vanilla desktop and R2018b. The execution time of the second solution is less than four times that of fileread(); strfind();. Eight files and four hundred words should be possible to process in a bit more than one minute (8*400*0.02). During my test the text file was somewhere in the cache system. The execution time will depend (a little) on whether you have a SSD or spinning disk.
%%





%#ok<*NASGU>
tic
chr = fileread('cssm.txt');
pos = strfind( chr, '12345' );
toc
%%
tic
fid = fopen('cssm.txt','rt');
cac = textscan( fid, '%s', 'Delimiter','\n' );
str = reshape( string( cac{1} ), 1,[] );
fclose( fid );
%%
e1 = regexp( str, "999", 'end', 'once' );
e2 = regexp( str, "12345", 'end', 'once' );
is1 = not( cellfun( 'isempty', e1 ) );
is2 = not( cellfun( 'isempty', e2 ) );
%%
pos = find( is1 & [ is2(2:end), false ], 1, 'first' );
%

found = false;
for p = reshape( pos, 1,[] )
if e1{p}==e2{p+1}
found = true;
break
end
end
toc
%%
tic
fid = fopen('cssm.txt','rt');
cac = textscan( fid, '%s', 'Delimiter','\n' );
str = reshape( string( cac{1} ), 1,[] );
fclose( fid );
%%
is1 = contains( str, "999" );
is2 = contains( str, "12345" );
pos = find( is1 & [ is2(2:end), false ], 1, 'first' );
%
found = false;
for p = reshape( pos, 1,[] )
if regexp(str(p),"999",'end','once') == regexp(str(p+1),"12345",'end','once')
found = true;
break
end
end
toc
(There are edge cases for which this script will throw errors.)