MATLAB: Using textscan with mixed data type in a single field/array

Hello,

I am having trouble reading a large (~30,000 rows) text file into Matlab. The data looks something like this:

BLOCK

 1) 1996/01/01 00:00:00     -99.000N    -99.000N 
 2) 1996/01/01 00:15:00     -99.000N    -99.000N 
 3) 1996/01/01 00:30:00     -99.000N    -99.000N 
 4) 1996/01/01 00:45:00     -99.000N    -99.000N 
 5) 1996/01/01 01:00:00     -99.000N    -99.000N

skipped rows

 16455) 1996/06/20 09:30:00     -99.000N    -99.000N 
 16456) 1996/06/20 09:45:00     -99.000N    -99.000N 
 16457) 1996/06/20 10:00:00     -99.000N    -99.000N 
 16458) 1996/06/20 10:15:00       1.869T      0.088T 
 16459) 1996/06/20 10:30:00       1.892       0.083  
 16460) 1996/06/20 10:45:00       1.913      -0.082  
 16461) 1996/06/20 11:00:00       1.913      -0.064  
 16462) 1996/06/20 11:15:00       1.895       0.035

I use textscan to read in the data like this:

textFilename = [year,SID,'.txt'];
fid = fopen(textFilename, 'rt');
C = textscan(fid, '%*s%d/%d/%d%d%c%d%c%d%f%c%f%c','Headerlines',11);

The problem (as you can see from the data) is some of the values in the last two columns contains a letter alongside it. As this doesn't apply to all rows, when I consider this letter as a character (%c), where it doesn't appear, textscan moves along and reads the '-' symbol from the next integer. Thus, the values from the fourth column are incorrectly read as positive where they are actually negative.

My question is that how can I tell textscan to read in the values from the last two columns whilst somehow separating the letters…

Any and all help greatly appreciated!

Ozgun

content = fileread( 'myFile.txt' ) ; % Build cell array of entries. pattern = '([\d]+)\)\s+([\d\s:/]{19})\s+([\d\-.]+)([NT]?)\s+([\d\-.]+)([NT]?)' ; tokens = regexp( content, pattern, 'tokens' ) ; tokens = reshape( [tokens{:}], numel( tokens{1} ), [] ).' ; % Convert columns into numeric, string, and time data. numData = str2double( tokens(:,[1,3,5]) ) ; % Row ID, 1st coord, 2nd coord. strData = tokens(:, [4,6]) ; % 1st and 2nd N, T, or empty. timData = datevec(tokens(:,2), 'yyyy/mm/dd HH:MM:SS' ) ;

Best Answer

If you don't need N and T, the simplest approach is probably to eliminate them before the call to TEXTSCAN:

 content       = fileread( 'myFile.txt' ) ;
 isNT          = content == 'N' | content = 'T' ;
 content(isNT) = ' ' ;                             % Replace with white space.

then you can TEXTSCAN type-homogeneous columns:

 C = textscan( content, ... ) ;

Note the content variable as first argument, as TEXTSCAN accepts both file handles and strings. If you need N and T, we can talk about the post-processing mentioned in my comment above (no time now, but I'll come back later tonight).

If you wanted to process the whole in one shot using REGEXP, here is an example, but keep in mind that REGEXP is overkill for this operation and will take more time to process than a basic TEXTSCAN.

Best Answer

Related Solutions

MATLAB: Read ascii non-delimited file

MATLAB: Delete rows with a special number

Related Question