MATLAB: How to extract only numerical values from cell into new column array

breakcell arrays

For this data:

"chb12_06.edf"
"1665 seconds"
"1726 seconds"
"3415 seconds"
"3447 seconds"
"chb12_08.edf"
"1426 seconds"
"1439 seconds"
"1591 seconds"
"1614 seconds"
"1957 seconds"
"1977 seconds"
"2798 seconds"
"2824 seconds"
"chb12_09.edf"
"3082 seconds"
"3114 seconds"
"3503 seconds"
"3535 seconds"
"chb12_10.edf"
"593 seconds"
"625 seconds"
"811 seconds"
"856 seconds"

Whis is a column in a larger cell array. I want to extract just the seconds value after each edf file, and possibly put them in a new array or matrix, eg:

{"chb12_06.edf" "1665 seconds" "1726 seconds" "3415 seconds" "3447 seconds" ; "chb12_08.edf" "1426 seconds" "1439 seconds" "1591 seconds" "1614 seconds" "1957 seconds" "1977 seconds" "2798 seconds" "2824 seconds"...}

I hope this makes sense what I'm trying to achieve, if not please feel free to clarify, thanks!

Best Answer

s={"chb12_06.edf" "1665 seconds" "1726 seconds" "3415 seconds" "3447 seconds" "chb12_08.edf" "1426 seconds" "1439 seconds" "1591 seconds" "1614 seconds" "1957 seconds" "1977 seconds" "2798 seconds" "2824 seconds"};
sc = cellfun(@char,s,'Unif',0); %cast for regexp, which does not work on string input
sm = regexp(sc,'^\d* seconds','match');
[sm{:}]

Related Solutions

MATLAB: What is the fastest way to convert a cell array of delimited numbers into a matrix

a={'1,5012,0,35,6';'2,395,1,35,8'};
b=str2num(char(a))
b =
           1        5012           0          35           6
           2         395           1          35           8
a={'1,5012,0,35,6';'2,395,1,35,8'};
aa=repmat(a,250000,1);
tic;
b=str2num(char(aa));
toc

Elapsed time is 19.681015 seconds.

MATLAB: Read text file line by line, and then store the information into a struct

I would go for a solution close to what you implemented, but splitting the file content first into blocks of data. For example:

 buffer = fileread('chb01-summary.txt') ;
 blocks = regexp(buffer, 'File Name', 'split') ;
 if length(blocks) < 2,  error('No content found in file.') ;  end
 blocks  = blocks(2:end) ;                           % 1st block is header.
 nBlocks = length(blocks) ;
 results = cell(nBlocks, 1) ;
 for bId = 1 : nBlocks  
    results{bId}.fileName = regexp(blocks{bId}, '\S+(?=\.edf)', 'match') ;
    results{bId}.fileStartTime = regexp(blocks{bId}, ...
        '(?<=File Start Time:\s*)\S+', 'match') ;
    results{bId}.fileEndTime = regexp(blocks{bId}, ...
        '(?<=File End Time:\s*)\S+', 'match') ;
    results{bId}.nSeizures = str2double( regexp(blocks{bId}, ...
        '(?<=in File:\s*)\d+', 'match') ) ;
    if results{bId}.nSeizures > 0
        results{bId}.seizures = regexp(blocks{bId}, ...
            'Seizure Start Time: (?<startTime>\d+).+?Seizure End Time: (?<endTime>\d+)', 'names') ;
    end
 end

With that, you get:

 >> results
 results = 
    [1x1 struct]
    [1x1 struct]
 >> results{1}
 ans = 
         fileName: {'chb01_02'}
    fileStartTime: {'12:42:57'}
      fileEndTime: {'13:42:57'}
        nSeizures: 0
 >> results{2}
 ans = 
         fileName: {'chb01_03'}
    fileStartTime: {'13:43:04'}
      fileEndTime: {'14:43:04'}
        nSeizures: 1
         seizures: [1x1 struct]
 >> results{2}.seizures
 ans = 
    startTime: '2996'
      endTime: '3036'

and what is left is probably a few conversions to numeric for relevant times.

Note that results{k}.seizures is a struct array, so if the 2nd entry in your file had been

 File Name: chb01_03.edf
 File Start Time: 13:43:04
 File End Time: 14:43:04
 Number of Seizures in File: 2
 Seizure Start Time: 2996 seconds
 Seizure End Time: 3036 seconds
 Seizure Start Time: 2997 seconds
 Seizure End Time: 3037 seconds

seizures times would be accessible through:

 >> results{2}.seizures
 ans = 
 1x2 struct array with fields:
    startTime
    endTime
 >> results{2}.seizures(1)
 ans = 
    startTime: '2996'
      endTime: '3036'
 >> results{2}.seizures(2)
 ans = 
    startTime: '2997'
      endTime: '3037'
 >> results{2}.seizures(2).startTime
 ans =
 2997

EDIT: note that the regexp inside the IF statement in the FOR loop is extracting data using named tokens. I did that for the example, but you could also go for a more common approach, e.g.

 if results{bId}.nSeizures > 0
     startTimes = regexp(blocks{bId}, '(?<=Seizure Start Time:\s*)\d+', 'match') ;
     endTimes = regexp(blocks{bId}, '(?<=Seizure End Time:\s*)\d+', 'match') ;
     results{bId}.seizures = str2double([startTimes; endTimes].') ;
 end

Best Answer

Related Solutions

MATLAB: What is the fastest way to convert a cell array of delimited numbers into a matrix

MATLAB: Read text file line by line, and then store the information into a struct

Related Question