MATLAB: How to read strings from file with fscanf or sscanf (NOT textscan)

fprintfsscanfstringstxt

So, of course, I'm having a little trouble right now. I'm trying to read a text file that goes something like this in a columnar order. What I would like to do is store the number, character and string columns seperately in arrays.
[Numbers] [Characters] [Strings]
Now, while I have figured out how to read the number and character columns into their own arrays, I cannot seem to do so with the string column. At least, not with fscanf or sscanf, which are the commands I want to use.
How can you read a file organized as such using fscanf or sscanf? (I know about textscan, I want to know if this is possible with fscanf or sscanf).
The first thing I tried was the following:
fid = fopen('Data.txt', 'w+');
B = fscanf(fid, '%d %c %s', [3,inf]);
Now while this worked fine for just the numbers and chars (i.e. B = fscanf(fid, '%d %c', [2,inf])), it fails for the above in the sense that it reads everything out of order (e.g. instead of B = [1,2,3…; a,b,c…; ABC, DEF, GHI…] I get B = [1,65,65; 66, 67, 2;…], just junk basically).
So I researched a bunch and tried out this:
fid = fopen('Data.txt', 'w+');
i = 1;
while ~feof(fid)
line = fgets(fid);
M(i) = sscanf(line, '%d, %c, %s', [3,inf];
i = i+1;
end
This runs, but M ends up coming out only as a row vector consisting of the first column of numbers in the data file. It just completely ignores the existence of chars and strings.
Now, to get a better understanding of the sscanf function I tried the following
fid = fopen('Data.txt', 'w+');
i = 1;
while ~feof(fid)
line = fgets(fid);
M(i) = sscanf(line, '%d, %d, %d', [3,inf];
i = i+1;
end
For a sample set of data consisting of just columns of numbers. This, incidentally, does exactly the same thing as previously; it just reads the first number column of the data and quits. So, I don't even know how to use sscanf, feof, or fgets properly, basically. So I could also use some help here as well.
And I know trying to read just columns of numbers is trivial with fscanf, but I'm trying to understand sscanf and fgets here.

Best Answer

Just a few alternate thoughts (and I'll think about FSCANF over the week end a little more).
=== Using REGEXP (available in almost all languages):
.. and the following content (to illustrate the flexibility):
1 A ABC
2 B ABC
3 C ABC DEF
4 D ABC
5 E ABC FGH
6 F ABC
7 G ABC
8 H ABC
9 I ABC
10 J ABC
Code:
>> buffer = fileread('data.txt') ; % Could be performed with FOPEN/FREAD
% to be more generic.
>> pattern = '(?<Column1>\d+)\s(?<Column2>\w+)\s+(?<Column3>.*?)[\r\n]' ;
>> n = regexp(buffer, pattern, 'names')
n =
1x10 struct array with fields:
Column1
Column2
Column3
>> n(2)
ans =
Column1: '2'
Column2: 'B'
Column3: 'ABC'
>> n(3)
ans =
Column1: '3'
Column2: 'C'
Column3: 'ABC DEF'
>> str2double({n(:).Column1})
ans =
1 2 3 4 5 6 7 8 9 10
etc .. here I used named tokens and a struct array output, just for the fun of it. I don't think that it is what you are looking for, but I just wanted to illustrated a regexp-based approach for the record.
=== Reading array of chars and converting to cell array based on position of spaces and \n and/or \r:
... to update if asked by OP.
=== Using FSCANF:
.. and the following, more regular content:
1 A ABC
2 B ABC
3 C ABC
4 D ABC
5 E ABC
6 F ABC
7 G ABC
8 H ABC
9 I ABC
10 J ABC
Code:
fid = fopen('data_regular.txt', 'r') ;
data = cell(1e6, 3) ; % Prealloc.
rCnt = 0 ; % Row counter.
while ~feof(fid)
rCnt = rCnt + 1 ;
data{rCnt,1} = fscanf(fid, '%d', 1) ;
data{rCnt,2} = fscanf(fid, '%s', 1) ;
data{rCnt,3} = fscanf(fid, '%s', 1) ;
end
fclose(fid) ;
data = data(1:rCnt,:) ; % Truncate.
Using this, we get:
>> data
data =
[ 1] 'A' 'ABC'
[ 2] 'B' 'ABC'
[ 3] 'C' 'ABC'
[ 4] 'D' 'ABC'
[ 5] 'E' 'ABC'
[ 6] 'F' 'ABC'
[ 7] 'G' 'ABC'
[ 8] 'H' 'ABC'
[ 9] 'I' 'ABC'
[10] 'J' 'ABC'
Note that EOF should be tested a little better (and not every three FSCANF, which assumes a well formed file). The whole could be in a TRY/CATCH statement otherwise.
=== Using FGETL + SSCANF:
It is more complicated than FSCANF, because the later moves forward an internal file pointer/counter as it reads the content, so the next read operation takes what follows. SSCANF doesn't work like this and you have to indicate what to extract and what to skip in the format. To illustrate:
>> s = '12 A ABC' ;
>> sscanf(s, '%d') % OK for the number.
ans =
12
>> sscanf(s, '%s') % Can we do the same for the 2nd col? KO.
ans =
12AABC
>> sscanf(s, '%*d %s', 1) % Skip # and read a 1 char string => KO, ASCII.
ans =
65
>> char(sscanf(s, '%*d %s', 1)) % => char, OK.
ans =
A
>> char(sscanf(s, '%*d %s %*s')) % Or read a string and skip next.
ans =
A
>> char(sscanf(s, '%*d %*s %s')) % Same for 3rd column, but dim KO.
ans =
A
B
C
>> char(sscanf(s, '%*d %*s %s')).' % Transpose, OK.
ans =
ABC