MATLAB: Text file data processing

data filedata parsingtext file processing

Dear Experts, It would be great help if somebody can help on this.
I have a text file which I want to read. It has couple of measurements data. In the beginning it starts with texts followed by a data set and then text again and another data set and so on. I want to read each data set and I dont care about the text.The text file looks like:
if true
:MSR 2 # No. of measurement in file
:SYS BDS 0 # Beam Data Scanner System
#
# RFA300 ASCII Measurement Dump ( BDS format )
#
...............
..............
!
#
# X Y Z Dose
#
= -123.9 0.0 32.0 11.8
= -123.6 0.0 32.0 11.9
= -123.2 0.0 32.0 12.1
= -122.7 0.0 32.0 12.2
= -122.2 0.0 32.0 12.5
= -121.7 0.0 32.0 12.6
= -121.4 0.0 32.0 12.6
:EOM # End of Measurement
#
# RFA300 ASCII Measurement Dump ( BDS format )
#
# Measurement number 2
#
%VNR 1.0
!
#
# X Y Z Dose
#
= 132.0 0.0 100.0 8.1
= 131.7 0.0 100.0 8.2
= 131.3 0.0 100.0 8.2
= 130.8 0.0 100.0 8.3
= 130.3 0.0 100.0 8.4
= 129.8 0.0 100.0 8.6
= 129.3 0.0 100.0 8.8
= 129.0 0.0 100.0 8.8
= 128.5 0.0 100.0 8.9
= 128.0 0.0 100.0 9.2
= 127.5 0.0 100.0 9.3
= 127.2 0.0 100.0 9.4
:EOM # End of Measurement
:EOF # End of File
end
From this file I want to read the numerical data under the column header, i.e, X Y Z dose for each measurement. Attached is my text file.
I greatly appreciate any help. Thanks. Rafiq

Best Answer

This code works with your original data file (which I uploaded here too). It parses most of the file into a structure, which has size 1-by-(number of measurements). Save the code below in a script and run it:
% Read file into a string:
str = fileread('test.txt');
% Check the number of measurements:
N = sscanf(str,':MSR %d');
[S,E] = regexp(str,'(?<=^# Measurement number).*?^:EOM','lineanchors');
assert(all(N==[numel(S),numel(E)]),'This file is incomplete or corrupted.')
% Preallocate structure:
out(N) = struct();
% Loop over each measurement in the file:
for n = 1:N
sub = str(S(n):E(n));
out(n).num = sscanf(sub,'%d');
% Assign fields and values:
tkn = regexp(sub,'^%(\w{3})([^#]*?)(#\s.*?)?\s*$','lineanchors','tokens');
for m = 1:numel(tkn)
tmp = tkn{m};
out(n).(tmp{1}) = strtrim(tmp{2});
if ~isempty(tmp{3})
out(n).([tmp{1},'_note']) = tmp{3}(3:end);
end
end
% Assign header:
tmp = regexp(sub,'^#(\s+\w+)+\s+#\s+=','tokens','lineanchors','once');
out(n).hdr = regexp(strtrim(tmp),'\s+','split');
% Assign data:
tmp = regexp(sub,'^=\s+(\s+[\d\.-]+)+\s*$','match','lineanchors');
out(n).dat = cell2mat(textscan([tmp{:}],'=%f%f%f%f'));
end
%
Explore the structure in your variable viewer, it should be fairly self-explanatory as it uses the same fieldnames as your data file. There are only three new fields: "num" (measurement number), "hdr" (numeric matrix column headers), and "dat" (numeric matrix).