I would go for a solution close to what you implemented, but splitting the file content first into blocks of data. For example:
buffer = fileread('chb01-summary.txt') ;
blocks = regexp(buffer, 'File Name', 'split') ;
if length(blocks) < 2, error('No content found in file.') ; end
blocks = blocks(2:end) ;
nBlocks = length(blocks) ;
results = cell(nBlocks, 1) ;
for bId = 1 : nBlocks
results{bId}.fileName = regexp(blocks{bId}, '\S+(?=\.edf)', 'match') ;
results{bId}.fileStartTime = regexp(blocks{bId}, ...
'(?<=File Start Time:\s*)\S+', 'match') ;
results{bId}.fileEndTime = regexp(blocks{bId}, ...
'(?<=File End Time:\s*)\S+', 'match') ;
results{bId}.nSeizures = str2double( regexp(blocks{bId}, ...
'(?<=in File:\s*)\d+', 'match') ) ;
if results{bId}.nSeizures > 0
results{bId}.seizures = regexp(blocks{bId}, ...
'Seizure Start Time: (?<startTime>\d+).+?Seizure End Time: (?<endTime>\d+)', 'names') ;
end
end
With that, you get:
>> results
results =
[1x1 struct]
[1x1 struct]
>> results{1}
ans =
fileName: {'chb01_02'}
fileStartTime: {'12:42:57'}
fileEndTime: {'13:42:57'}
nSeizures: 0
>> results{2}
ans =
fileName: {'chb01_03'}
fileStartTime: {'13:43:04'}
fileEndTime: {'14:43:04'}
nSeizures: 1
seizures: [1x1 struct]
>> results{2}.seizures
ans =
startTime: '2996'
endTime: '3036'
and what is left is probably a few conversions to numeric for relevant times.
Note that results{k}.seizures is a struct array, so if the 2nd entry in your file had been
File Name: chb01_03.edf
File Start Time: 13:43:04
File End Time: 14:43:04
Number of Seizures in File: 2
Seizure Start Time: 2996 seconds
Seizure End Time: 3036 seconds
Seizure Start Time: 2997 seconds
Seizure End Time: 3037 seconds
seizures times would be accessible through:
>> results{2}.seizures
ans =
1x2 struct array with fields:
startTime
endTime
>> results{2}.seizures(1)
ans =
startTime: '2996'
endTime: '3036'
>> results{2}.seizures(2)
ans =
startTime: '2997'
endTime: '3037'
>> results{2}.seizures(2).startTime
ans =
2997
EDIT: note that the regexp inside the IF statement in the FOR loop is extracting data using named tokens. I did that for the example, but you could also go for a more common approach, e.g.
if results{bId}.nSeizures > 0
startTimes = regexp(blocks{bId}, '(?<=Seizure Start Time:\s*)\d+', 'match') ;
endTimes = regexp(blocks{bId}, '(?<=Seizure End Time:\s*)\d+', 'match') ;
results{bId}.seizures = str2double([startTimes; endTimes].') ;
end
Best Answer