MATLAB: Textscan difficulties with mixed datatypes

I am having difficulty solving a particular problem. I might just be missing the wood for the trees but here goes:

I have a large (> 1mio) cellstr that has the following type of format (only 3 row example shown):

    blockCSV = {'record1,2,3,string4,s5';'rec2,22,33,str4,str5';'r3,222,333,s4,st5'};

I then attempt to textscan through each cellstr (for loop, as textscan is not "vectorized" for cellstr) using one of the following two syntaxes:

    temp = textscan(blockCSV{i},'%s%f%f%s%s','delimiter',',','CollectOutput',0)

    temp = textscan(blockCSV{i},'%s%f%f%s%s','delimiter',',','CollectOutput',1)

Now, the problem is that temp comes out as a cell that contains cells and matrices ie. indexing within indexing on different datatypes. I can't afford to index each one individually inside the loop (large dataset as mentioned) but I need the output to come out as :

   ans = 
    'record1'    [  2]    [  3]    'string4'    's5'  
    'rec2'       [ 22]    [ 33]    'str4'       'str5'
    'r3'         [222]    [333]    's4'         'st5'

[Edited for clarity (hopefully)]: Instead I get something like (CollectOutput is false):

ans =

    {1x1 cell}    [2]    [3]    {1x1 cell}    {1x1 cell}
    {1x1 cell}    [2]    [3]    {1x1 cell}    {1x1 cell}
    {1x1 cell}    [2]    [3]    {1x1 cell}    {1x1 cell}

or (CollectOutput is true):

ans =

    {1x1 cell}    [1x2 double]    {1x2 cell}
    {1x1 cell}    [1x2 double]    {1x2 cell}
    {1x1 cell}    [1x2 double]    {1x2 cell}

With CollectOutput == false I would expect to see what I stated above instead of a cell within a cell within makes any indexing very difficult?

I hope this makes sense. I'm sure i'm missing something simplistic.

PS: I think textscan is inconsistent because when you read the example from an actual file (instead of a cellstr) it works exactly like I want the outcome to be without any for loop or indexing.

Regards, Phillip

Best Answer

Why do you get the CSV content as a cell array of rows? If you cannot change this, you could just merge/concatenate all these rows inserting line breaks, and use TEXTSCAN on the whole.

 merger = [blockCSV, repmat({sprintf('\n')}, numel(blockCSV), 1)].' ;
 data   = textscan([merger{:}], '%s%f%f%s%s', 'Delimiter', ',') ;

with that you get

 >> data
 data = 
    {3x1 cell}    [3x1 double]    [3x1 double]    {3x1 cell}    {3x1 cell}

which is most appropriate memory-wise and for further indexing, as numeric entries are stored in numeric arrays, and non-numeric entries in cell arrays.

Best Answer

Related Solutions

MATLAB: Could anyone help me to solve the issue.

MATLAB: Combine a cell array of cell arrays to a single cell array

Related Question