MATLAB: Read text file line by line, and then store the information into a struct

regexpstructtline

Hi everyone,
I am trying to read a file that is something like Data Sampling Rate: 256 Hz ***********************
Channels in EDF Files: ******************** Channel 1: FP1-F7 Channel 2: F7-T7 Channel 3: T7-P7 Channel 4: P7-O1
File Name: chb01_02.edf
File Start Time: 12:42:57
File End Time: 13:42:57
Number of Seizures in File: 0
File Name: chb01_03.edf
File Start Time: 13:43:04
File End Time: 14:43:04
Number of Seizures in File: 1
Seizure Start Time: 2996 seconds
Seizure End Time: 3036 seconds
so far i have:
fid1= fopen('chb01-summary.txt')
data=struct('id',{},'stime',{},'etime',{},'seizenum',{},'sseize',{},'eseize',{});
if fid1 ==-1
error('File cannot be opened ')
end
tline= fgetl(fid1);
while ischar(tline)
i=1;
disp(tline);
I want to use regexp to find the expressions and so far have
line1= '(.*\d{2} (\.edf)'
data{1} = regexp(tline, line1);
tline=fgetl(fid1);
time = '^Time: .*\d{2]}: \d{2} :\d{2}' ;
data{2}= regexp(tline,time);
tline=getl(fid1);
seizure = '^File: .*\d';
data{4}= regexp(tline,seizure);
if data{4}>0
stime = '^Time: .*\d{5}';
tline=getl(fid1);
data{5}= regexp(tline,seizure);
tline= getl(fid1);
data{6}= regexp(tline,seizure);
end
And I tried using a loop to find the line at which file name starts with if true for (firstline<1) (firstline>1 ) firstline= strfind(tline, 'File Name') tline=fgetl(fid1); end end
and then I am stumped. say I am at the line at which the information is there, how do i store the information with regexp? i got data= [] [] after running the code once…
Thanks in advance..

Best Answer

I would go for a solution close to what you implemented, but splitting the file content first into blocks of data. For example:
buffer = fileread('chb01-summary.txt') ;
blocks = regexp(buffer, 'File Name', 'split') ;
if length(blocks) < 2, error('No content found in file.') ; end
blocks = blocks(2:end) ; % 1st block is header.
nBlocks = length(blocks) ;
results = cell(nBlocks, 1) ;
for bId = 1 : nBlocks
results{bId}.fileName = regexp(blocks{bId}, '\S+(?=\.edf)', 'match') ;
results{bId}.fileStartTime = regexp(blocks{bId}, ...
'(?<=File Start Time:\s*)\S+', 'match') ;
results{bId}.fileEndTime = regexp(blocks{bId}, ...
'(?<=File End Time:\s*)\S+', 'match') ;
results{bId}.nSeizures = str2double( regexp(blocks{bId}, ...
'(?<=in File:\s*)\d+', 'match') ) ;
if results{bId}.nSeizures > 0
results{bId}.seizures = regexp(blocks{bId}, ...
'Seizure Start Time: (?<startTime>\d+).+?Seizure End Time: (?<endTime>\d+)', 'names') ;
end
end
With that, you get:
>> results
results =
[1x1 struct]
[1x1 struct]
>> results{1}
ans =
fileName: {'chb01_02'}
fileStartTime: {'12:42:57'}
fileEndTime: {'13:42:57'}
nSeizures: 0
>> results{2}
ans =
fileName: {'chb01_03'}
fileStartTime: {'13:43:04'}
fileEndTime: {'14:43:04'}
nSeizures: 1
seizures: [1x1 struct]
>> results{2}.seizures
ans =
startTime: '2996'
endTime: '3036'
and what is left is probably a few conversions to numeric for relevant times.
Note that results{k}.seizures is a struct array, so if the 2nd entry in your file had been
File Name: chb01_03.edf
File Start Time: 13:43:04
File End Time: 14:43:04
Number of Seizures in File: 2
Seizure Start Time: 2996 seconds
Seizure End Time: 3036 seconds
Seizure Start Time: 2997 seconds
Seizure End Time: 3037 seconds
seizures times would be accessible through:
>> results{2}.seizures
ans =
1x2 struct array with fields:
startTime
endTime
>> results{2}.seizures(1)
ans =
startTime: '2996'
endTime: '3036'
>> results{2}.seizures(2)
ans =
startTime: '2997'
endTime: '3037'
>> results{2}.seizures(2).startTime
ans =
2997
EDIT: note that the regexp inside the IF statement in the FOR loop is extracting data using named tokens. I did that for the example, but you could also go for a more common approach, e.g.
if results{bId}.nSeizures > 0
startTimes = regexp(blocks{bId}, '(?<=Seizure Start Time:\s*)\d+', 'match') ;
endTimes = regexp(blocks{bId}, '(?<=Seizure End Time:\s*)\d+', 'match') ;
results{bId}.seizures = str2double([startTimes; endTimes].') ;
end