MATLAB: Importing line-by-line from a text file

text filetextscan

Hi I have a problem choosing what kind of tool to import data from a text file. The data looks something like this although the lines are like 4000 and

35,       0401-BA1,1,6,8,00000,0,01,053,50530,'Channel                 ',,
35,       0401-CB1,1,6,8,00000,0,01,051,50510,'Spillway                ',,
35,       0401-EB1,1,3,2,00052,0,01,051,50510,'RioGrande               ',,
35,       0401-FS1,1,6,8,00203,0,01,011,27340,'Spillway                ',,
35,       0401-MB1,1,6,8,00000,0,01,037,0    ,'Canal                   ',,

I used the fscanf , textscan but I don't understand how to tackle the spaces. for the second space is it like %15c ?? Also How do I switch to another line should I use /n? Please help me here guys. Thank you

Best Answer

Hint

    str = '35,  0401-BA1,1,6,8,00000,0,01,053,50530,''Channel     '',,';
    fmt = '%d%s%d%d%d%d%d%d%d%d%s%s';
    cac = textscan( str, fmt               ...
                ,   'Delimiter'    , ','   ...
                ,   'CollectOutput', true  );

" how to tackle the spaces" &nbsp depends on how you want the spaces in the resulting string variable values
"line-by-line" &nbsp what exactly do you mean?

&nbsp

Addendum I

This illustrates an approach. However, due to various mismatches between the header row and the data rows the result is not useful.

    sas = cssm()

returns

    1x4000 struct array with fields:
        STATE_CODE_001
        STRUCTURE_NUMBER_008
        RECORD_TYPE_005A
        ROUTE_PREFIX_005B
        SERVICE_LEVEL_005C
        ROUTE_NUMBER_005D
        DIRECTION_005E
        HIGHWAY_DISTRICT_002
        COUNTY_CODE_003
        PLACE_CODE_004
        ....

where

    function    sas = cssm 
    ffs = 'c:\m\cssm\NM14.txt';
    fid = fopen( ffs );
    str = fgetl( fid );
    col_head_list = parse_header( str );
    sas = repmat( cell2struct( repmat({''},1,112), col_head_list(1:112), 2 ), 1,4000 ); 
    jj  = 0;
    while not( feof(fid) )
        str = fgetl( fid );
        buf = parse_data_row( str );
        jj  = jj + 1;
        sas(1,jj) = cell2struct( buf(1:112), col_head_list(1:112), 2 );
    end
    fclose( fid );
    end
    function    cac = parse_header( str )
        cac = strsplit( str, ',' );
        cac = strtrim( cac );
    end
    function    cac = parse_data_row( str )
        cac = strsplit( str, ',' );
        cac = strtrim( cac );
    end

&nbsp

Addendum II

I was fooled by the function, strsplit, which as default treats multiple delimiters as one. In my world that only applies to space. Anyhow, now I have fixed the function cssm.

Read the data to a structure

    >> sas=cssm;

find all bridges, which have STATUS_WITH_10YR_RULE equal to '2'.

    >> is_sw10yrEq2 = strcmp('2',{sas.STATUS_WITH_10YR_RULE});
    >> sum(is_sw10yrEq2)
    ans =
       324

there are 324 of them. List their building year

    >> {sas(is_sw10yrEq2).YEAR_BUILT_027}
    ans = 
      Columns 1 through 9
        '1915'   '1936'   '1928'   '1931'   '1920'   '1951'   '1933'   '1939'   '1939'
      Columns 10 through 18
        '1924'   '1931'   '1931'   '1931'   '1932'   '1932'   '1936'   '1937'   '1936'
      ....

where

    function    sas = cssm 
        ffs = 'c:\m\cssm\NM14.txt';
        fid = fopen( ffs );
        str = fgetl( fid );
        colhead_list = parse_header( str );
        n_colhead = length( colhead_list );
        sas = repmat( cell2struct                                   ...
                    (   repmat({''},1,n_colhead), colhead_list, 2 ) ...
                    ,   1,4000                                      ); 
        jj  = 0;
        while not( feof(fid) )
            str = fgetl( fid );
            buf = parse_data_row( str );
            jj  = jj + 1;
            sas(1,jj) = cell2struct( buf, colhead_list, 2 );
        end
        fclose( fid );
        sas(1,jj+1:end) = [];
    end
    function    cac = parse_header( str )
        cac = strsplit( str, ',', 'CollapseDelimiters',false );
        cac = strtrim( cac );
    end
    function    cac = parse_data_row( str )
        cac = strsplit( str, ',', 'CollapseDelimiters',false );
        cac = strtrim( cac );
    end

&nbsp

Addendum III

Caveat: I hardly used the new (couple of years old), data type table. Nevertheless, table is targeted at problems like yours and I think you should

use Import Data interactively to read from a file of format B (one header line, comma separated, varying width) to a table. Make sure to uncheck Multiple Delimiters as One and select table. I've tested; it worked fine.
spend time with the documentation of table instead of fighting file formats.

&nbsp

Addendum IV

It is straightforward to read the file of format B (one header line, comma separated, varying width) with textscan

    >> cac = cssm
    cac = 
        {3951x134 cell}

where

    function    cac = cssm 
        ffs = 'c:\m\cssm\NM14.txt'; % one header line, comma separated, varying width 
        fid = fopen( ffs );
        str = fgetl( fid );
        chl = strtrim( strsplit( str, ',', 'CollapseDelimiters',false ) );
        len = length( chl );
        frm = repmat( '%s', 1, len );
        cac = textscan( fid, frm, 'CollectOutput',true ); 
        fclose( fid );
    end

Related Solutions

MATLAB: Importing cnv file with header lines

Given

the entire file fits in a fraction of the memory
*END* is the last line before the numerical part of the file. (This is the only occurrence of *END*.)
all files have 29 columns of numerical data

then one way is

str = fileread( 'd94i006.txt' );
str = regexp( str, '(?<=\*END\*\s+).+$', 'match' );
cac = textscan( str{1}, repmat('%f',[1,29]), 'CollectOutput',true );
num = cac{1};

result

>> whos num
  Name        Size            Bytes  Class     Attributes
  num       339x29            78648  double

Note: "has about 130 header lines." makes it safer to use the line *END*.

In response to comment "loop [...] every file from the folder?"

Try

>> cac = cssm( 'c:\your_folder\with_data\' );

where in one file

function    cac = cssm( folderspec )
    sas = dir( fullfile( folderspec, '*.txt' ) );
    len = length( sas );
    cac = cell( 1, len );
    for jj = 1 : len
        cac{jj} = cssm_( fullfile( folderspec, sas(jj).name ) );
    end
end
function    num = cssm_( filespec )
    str = fileread( filespec );
    str = regexp( str, '(?<=\*END\*\s+).+$', 'match' );
    cac = textscan( str{1}, repmat('%f',[1,29]), 'CollectOutput',true );
    num = cac{1};
end

As is; Not tested Needed: better names and some comments. There is a magic number, 29, in the code.

MATLAB: Concatenate data using fgets

You are closing the file in the loop. close the file after the loop. I assume the below should work.

function read_datapattern(filename)
fid = fopen(filename,'rt');
if fid < 0
  error('error opening file %s\n\n',filename);
end
pattern = fgets(fid);
for n = 1:32767
  nextline = fgets(fid);
  pattern = strcat(pattern, nextline);
fprintf(pattern);
fprintf('\n');
end
fclose(fid);

Best Answer

Related Solutions

MATLAB: Importing cnv file with header lines

MATLAB: Concatenate data using fgets

Related Question