MATLAB: Limit to Textscan

importragged filetextscan

Hi all, I have been importing multiple data files (typically hundreds of files) quite successfully to Matlab using the textscan function.
Recently, my raw file format has changed (due to different data acquisition setup). Previously, I had one time column, and 20 data columns, and all columns were of the same length. But now, each data column has it's own time column (which do not line up with the other data), and the length of each data column is different from one another. I've made additions to my script so that it also reads in all the corresponding times for each data column, but I've discovered now for some reason, it doesn't read the whole file. It will read the file until about row 123, even though some columns go up to row 247, and some go up to 641. So I'm just curious if this is a limitation of the textscan function, or if the new code I added is funky.

Best Answer

Thanks for clarifying what your data looks like.
I assume that comma immediately after the '4' is a mistake. You could probably do this with a regexp... Because each comma denotes a pair of values. I take it that if the value before the comma is missing then the value after is also missing.
Do you have a fixed number of columns? If so, are the commas always there?
If at least the second condition above is true, then this isn't so bad... You can read pairs of values using regexp:
lines = {'1, 2 3, 4 5, 6'
'1, 2 3, 4 5, 6'
'1, 2 3, 4 , '
' , 3, 4 , '};
toks = regexp(lines, '\s*(\w*)\s*,\s*(\w*)', 'tokens');
This extracts word-like strings with optional spaces and the obligatory comma.
What you end up with is one cell per row, and within that one cell per pairing. You can manipulate this data as you see fit, convert empty strings or non-numbers to NaN, etc...
I dunno, that's the kind of solution I come up with when I don't want to spend too much time thinking up more complicated clever stuff.
[EDIT]
The above regexp fails on the fourth line because there's no logic that says if you have the first value you must have the second (and vice versa)... So try this:
toks = regexp(lines, '\s*(\w+)\s*,\s*(\w+)|\s*()\s*,\s*()', 'tokens');
rows = cell(size(toks));
for r = 1:numel(toks)
rows(r) = { str2double([toks{r}{:}]) };
end
Now you have a cell with one row per line, containing a vector of doubles...
This won't work with other rubbish in your data like % signs, but you can either filter that or allow for it in the regular expression....
And if course if you know that all your rows are the same length (or force them to be after processing), you can convert the whole rows array to a matrix with cell2mat
Related Question