MATLAB: Read specific data columns from a text file based on header name requested by user

MATLABtext file

Hello,
I have the matlab version 2018a.
I'm trying to extract specific columns of a text file based on the header name of the column. I have tried couple of different methods such as readtable, textscanf, etc. but, none of them exactly worked as I expected.
I have attached the text file itself. I'm trying to make sure the code I'm writing is not slow because there are 1000's of these files that I need to look into in a for-loop possibly.
The structure never changes but, the header columns can be in different positions and that's the reason why I want the code to find the header name no matter which position the column is in.
Here is a sample from the text file:
As it can be seen, the same dates are repeated below with different headers (information) and it is repeated 3-4 times in the actual text file. If I know how to pick up "WOPR – PROD1", "WOPR-PROD2", and "FOPT" columns and put them into a matrix in this order [WOPR-PROD1; WOPR-PROD2; FOPT] I can figure out the rest I believe. I prefer not to modify the text file itself if possible.
"--------""-----------""-----------""-----------""-----------""-----------""-----------""-----------""-----------""-----------"
"SUMMARY OF RUN Original_1
"--------""-----------""-----------""-----------""-----------""-----------""-----------""-----------""-----------""-----------"
"DATE ""YEARS ""FOPR ""FWPR ""FGPR ""FOPT ""FGPT ""FWPT ""FWCT ""FWIR "
" ""YEARS ""STB/DAY ""STB/DAY ""MSCF/DAY ""STB ""MSCF ""STB "" ""STB/DAY "
" "" "" "" "" "" "" "" "" "" "
" "" "" "" "" "" "" "" "" "" "
"--------""-----------""-----------""-----------""-----------""-----------""-----------""-----------""-----------""-----------"
" 1JAN2009" 0 0 0 0 0 0 0 0 0
" 1FEB2009" 0.084873 0 0 0 0 0 0 0 0
" 1MAR2009" 0.161533 2000.000 65.16867 1360.000 56000.00 38080.00 1824.723 0.031556 0
" 1APR2009" 0.246407 2000.000 67.93040 1360.000 118000.0 80240.00 3906.001 0.032849 0
" 1MAY2009" 0.328542 2449.850 53.91752 1665.898 191495.5 130216.9 5523.527 0.021535 0
"--------""-----------""-----------""-----------""-----------""-----------""-----------""-----------""-----------""-----------"
"SUMMARY OF RUN Original_1 "
"--------""-----------""-----------""-----------""-----------""-----------""-----------""-----------""-----------""-----------"
"DATE ""FWIT ""FGOR ""FOIP ""FWIP ""FGIP ""FPR ""WOPR ""WOPR ""WOPR "
" ""STB ""MSCF/STB ""STB ""STB ""MSCF ""PSIA ""STB/DAY ""STB/DAY ""STB/DAY "
" "" "" ""*10**3 ""*10**3 ""*10**3 "" "" "" "" "
" "" "" "" "" "" "" ""PROD1 ""PROD2 ""PROD3 "
" "" "" "" "" "" "" "" "" "" "
"--------""-----------""-----------""-----------""-----------""-----------""-----------""-----------""-----------""-----------"
" 1JAN2009" 0 0 31190.54 645456.1 21209.57 6553.930 0 0 0
" 1FEB2009" 0 0 31190.54 645456.1 21209.57 6553.922 0 0 0
" 1MAR2009" 0 0.680000 31134.54 645454.2 21171.49 6473.267 0 0 0
" 1APR2009" 0 0.680000 31072.54 645452.2 21129.33 6394.598 0 0 0
" 1MAY2009" 0 0.680000 30999.18 645450.7 21079.44 6296.722 0 1675.190 0
Any help is appreciated. Thank you.

Best Answer

Before I continue, I want to Thank @Bob Nbob and @Stephen Cobeldick for their work, suggestions and help.
I really appreciate everything you guys are doing for this community.
Whoever is interested in this post and was waiting for an answer.
It took me a while but, I have finally got it to work correctly. The code is a little long and I did not exactly choose good variable names or probably wrote comments detailed enough.
If I can write a more efficient code and anyone has suggestions I'll consider them.
I don't know if 0.027222 seconds (from beginning to end) is efficient enough for this kind of task: read 68 columns of info from a text file full of "pages" of columns with no delimiters between columns.
The output is a 1X68 column cell array called "storage" with 99X1 or 100X1 cell arrays inside each cell of 68 cells. The output can be changed to numeric value later after removing some of the "." at the very end of some numbers (look at the number "8713996." FOPT in first cell towards the last rows - there is no "0" after the decimal).
The output is also a cell array and can be converted to a different type of array by using cell2mat,etc. functions (which I have not used in this code).
%% Read txt file
% Reset all the variables
clear;
clc;
% Read the content of the text file into memory
content = fileread('Original_1.txt');
% Declare desired string occurences (columns) to create the storage cell
% (initially with an unknown size) and store specified columns
desired_string = ["FOPT", "FGPT", "FWPT", "WOPR", "WWPR", "WOPT", "WWPT", "WGPT", "WGPR", "WBHP", "WGOR" ];
% Create cell array to store the columns desired
count_string = count(content,desired_string );
storage = cell(1,count_string);
% Delete unnecessary strings and special characters for readibility (should
% be left with 130 characters "per line"-content variable in workspace is a
% character vector of size 1*154485). Warning: New line and carriage return
% characters also need to be deleted to get 130 characters "per line".
new_content = regexprep(content, '"SUMMARY\s.*$|"-.*$|SUMMARY\s.*$|^\s+|\n|\r', '', 'lineanchors', 'dotexceptnewline');
% If first 13 characters containts the string 'DATE' insert '!' to beginning
% as a delimiter to separate into pages
search_string = {'"','DATE'};
first_13 = new_content(1:13);
if contains(first_13,search_string(1,1))
new_str = insertBefore(new_content, '"DATE', '!');
final_str = strsplit(new_str, '!');
elseif contains(first_13, search_string(1,2))
new_str = insertBefore(new_content, 'DATE', '!');
final_str = strsplit(new_str, '!');
end
% Search for a specific string and read all the columns
pages = length(final_str);
add = 13;
count = 0;
next_column = 0;
% For loop for reading each page of the content
for ii = 2:pages
% Convert cell into character vector
new_char_vec = char(final_str(1,ii));
% Length of each character vector in each cell
long = length(char(final_str(1,ii)));
% Column number of each page!
row_no = long/130;
% Read first 130 characters, 13 characters at a time 10 times and
% match the desired string
for i = 1:10
row = 1 + count;
column = 13 + count;
testing = new_char_vec(1, (row:column));
count = count + add;
% if contains 'FOPT' %read all the columns related
if contains(testing,desired_string)
% Move to next column of storage cell when string is found
next_column = next_column + 1;
% Create cell array inside each column of storage cell array
% row_no varies
storage{1,next_column} = cell(row_no,1);
% read_rows and read_columns are reset to row and column
% everytime the desired_string is found
read_rows = row;
read_columns = column;
% Store the the desired string column's first row
storage{1,next_column}{1,1} = new_char_vec(1, (read_rows:read_columns));
% Store each row of the desired string column (starting from 2nd row) in a for loop
for jj = 2:(row_no)
read_rows = read_rows + 130;
read_columns = read_columns + 130;
storage{1,next_column}{jj,1} = new_char_vec(1, (read_rows:read_columns));
end
end
end
% Reset count
count = 0;
end
% Clear all the unnecessary variables
clear add column content count count_string desired_string final_str first_13 i ii jj long new_char_vec new_content new_str next_column pages read_rows read_columns row row_no search_string testing