MATLAB: Limit to Textscan

importragged filetextscan

Hi all, I have been importing multiple data files (typically hundreds of files) quite successfully to Matlab using the textscan function.

Recently, my raw file format has changed (due to different data acquisition setup). Previously, I had one time column, and 20 data columns, and all columns were of the same length. But now, each data column has it's own time column (which do not line up with the other data), and the length of each data column is different from one another. I've made additions to my script so that it also reads in all the corresponding times for each data column, but I've discovered now for some reason, it doesn't read the whole file. It will read the file until about row 123, even though some columns go up to row 247, and some go up to 641. So I'm just curious if this is a limitation of the textscan function, or if the new code I added is funky.

Best Answer

Thanks for clarifying what your data looks like.

I assume that comma immediately after the '4' is a mistake. You could probably do this with a regexp... Because each comma denotes a pair of values. I take it that if the value before the comma is missing then the value after is also missing.

Do you have a fixed number of columns? If so, are the commas always there?

If at least the second condition above is true, then this isn't so bad... You can read pairs of values using regexp:

lines = {'1, 2  3, 4  5,  6'
         '1, 2  3, 4  5,  6'
         '1, 2  3, 4   ,  '
         ' ,    3, 4  , '};
toks = regexp(lines, '\s*(\w*)\s*,\s*(\w*)', 'tokens');

This extracts word-like strings with optional spaces and the obligatory comma.

What you end up with is one cell per row, and within that one cell per pairing. You can manipulate this data as you see fit, convert empty strings or non-numbers to NaN, etc...

I dunno, that's the kind of solution I come up with when I don't want to spend too much time thinking up more complicated clever stuff.

[EDIT]

The above regexp fails on the fourth line because there's no logic that says if you have the first value you must have the second (and vice versa)... So try this:

toks = regexp(lines, '\s*(\w+)\s*,\s*(\w+)|\s*()\s*,\s*()', 'tokens');
rows = cell(size(toks));
for r = 1:numel(toks)
  rows(r) = { str2double([toks{r}{:}]) };
end

Now you have a cell with one row per line, containing a vector of doubles...

This won't work with other rubbish in your data like % signs, but you can either filter that or allow for it in the regular expression....

And if course if you know that all your rows are the same length (or force them to be after processing), you can convert the whole rows array to a matrix with cell2mat

Related Solutions

MATLAB: Error using textscan. Delimiter must be a string.

R2012a did not support a cell array of delimiter strings for textscan.

You might need to break the lines apart some other way, such as by using regexp()

MATLAB: CSV file import with timestamp

To import formatted data, textscan is usually a good option.

Textscan requires you to define a format. Assuming you want to separate all the elements in the file (value, every component from the timestamp separately, id-number), your format could look like this:

fmt = %f "%u-%u-%u %u:%u:%u+%u" %u

In this format; the '%f' and '%u' represent the datatypes that MATLAB is supposed to use for the data it reads (floating point, integer). All the other signs ("-:+) are 'literals'. They tell MATLAB that it will literally find a colon (that should be ignored).

In addition, you will need to define in the textscan-command that the delimiter is a comma

textscan(fid,fmt,'delimiter',',')

Hope this helps

Marlies

Best Answer

Related Solutions

MATLAB: Error using textscan. Delimiter must be a string.

MATLAB: CSV file import with timestamp

Related Question