MATLAB: Read text file line by line, and then store the information into a struct

regexpstructtline

Hi everyone,

I am trying to read a file that is something like Data Sampling Rate: 256 Hz ***********************

Channels in EDF Files: ******************** Channel 1: FP1-F7 Channel 2: F7-T7 Channel 3: T7-P7 Channel 4: P7-O1

File Name: chb01_02.edf

File Start Time: 12:42:57

File End Time: 13:42:57

Number of Seizures in File: 0

File Name: chb01_03.edf

File Start Time: 13:43:04

File End Time: 14:43:04

Number of Seizures in File: 1

Seizure Start Time: 2996 seconds

Seizure End Time: 3036 seconds

so far i have:

    fid1= fopen('chb01-summary.txt')
    data=struct('id',{},'stime',{},'etime',{},'seizenum',{},'sseize',{},'eseize',{});
    if fid1 ==-1
        error('File cannot be opened ')
    end
    tline= fgetl(fid1);
    while ischar(tline)
        i=1;
        disp(tline);

I want to use regexp to find the expressions and so far have

    line1= '(.*\d{2} (\.edf)' 
    data{1} = regexp(tline, line1);
    tline=fgetl(fid1);
    time = '^Time: .*\d{2]}: \d{2} :\d{2}' ;
    data{2}= regexp(tline,time);
    tline=getl(fid1);
    seizure = '^File: .*\d';
    data{4}= regexp(tline,seizure);
    if data{4}>0
        stime = '^Time: .*\d{5}'; 
        tline=getl(fid1);
        data{5}= regexp(tline,seizure);
        tline= getl(fid1);
        data{6}= regexp(tline,seizure);
    end

And I tried using a loop to find the line at which file name starts with if true for (firstline<1) (firstline>1 ) firstline= strfind(tline, 'File Name') tline=fgetl(fid1); end end

and then I am stumped. say I am at the line at which the information is there, how do i store the information with regexp? i got data= [] [] after running the code once…

Thanks in advance..

Best Answer

I would go for a solution close to what you implemented, but splitting the file content first into blocks of data. For example:

 buffer = fileread('chb01-summary.txt') ;
 blocks = regexp(buffer, 'File Name', 'split') ;
 if length(blocks) < 2,  error('No content found in file.') ;  end
 blocks  = blocks(2:end) ;                           % 1st block is header.
 nBlocks = length(blocks) ;
 results = cell(nBlocks, 1) ;
 for bId = 1 : nBlocks  
    results{bId}.fileName = regexp(blocks{bId}, '\S+(?=\.edf)', 'match') ;
    results{bId}.fileStartTime = regexp(blocks{bId}, ...
        '(?<=File Start Time:\s*)\S+', 'match') ;
    results{bId}.fileEndTime = regexp(blocks{bId}, ...
        '(?<=File End Time:\s*)\S+', 'match') ;
    results{bId}.nSeizures = str2double( regexp(blocks{bId}, ...
        '(?<=in File:\s*)\d+', 'match') ) ;
    if results{bId}.nSeizures > 0
        results{bId}.seizures = regexp(blocks{bId}, ...
            'Seizure Start Time: (?<startTime>\d+).+?Seizure End Time: (?<endTime>\d+)', 'names') ;
    end
 end

With that, you get:

 >> results
 results = 
    [1x1 struct]
    [1x1 struct]
 >> results{1}
 ans = 
         fileName: {'chb01_02'}
    fileStartTime: {'12:42:57'}
      fileEndTime: {'13:42:57'}
        nSeizures: 0
 >> results{2}
 ans = 
         fileName: {'chb01_03'}
    fileStartTime: {'13:43:04'}
      fileEndTime: {'14:43:04'}
        nSeizures: 1
         seizures: [1x1 struct]
 >> results{2}.seizures
 ans = 
    startTime: '2996'
      endTime: '3036'

and what is left is probably a few conversions to numeric for relevant times.

Note that results{k}.seizures is a struct array, so if the 2nd entry in your file had been

 File Name: chb01_03.edf
 File Start Time: 13:43:04
 File End Time: 14:43:04
 Number of Seizures in File: 2
 Seizure Start Time: 2996 seconds
 Seizure End Time: 3036 seconds
 Seizure Start Time: 2997 seconds
 Seizure End Time: 3037 seconds

seizures times would be accessible through:

 >> results{2}.seizures
 ans = 
 1x2 struct array with fields:
    startTime
    endTime
 >> results{2}.seizures(1)
 ans = 
    startTime: '2996'
      endTime: '3036'
 >> results{2}.seizures(2)
 ans = 
    startTime: '2997'
      endTime: '3037'
 >> results{2}.seizures(2).startTime
 ans =
 2997

EDIT: note that the regexp inside the IF statement in the FOR loop is extracting data using named tokens. I did that for the example, but you could also go for a more common approach, e.g.

 if results{bId}.nSeizures > 0
     startTimes = regexp(blocks{bId}, '(?<=Seizure Start Time:\s*)\d+', 'match') ;
     endTimes = regexp(blocks{bId}, '(?<=Seizure End Time:\s*)\d+', 'match') ;
     results{bId}.seizures = str2double([startTimes; endTimes].') ;
 end

Best Answer

Related Solutions

MATLAB: Trying to delete lines that start with an alphabet on a text file

MATLAB: Copy a line from a txt to other txt

Related Question