MATLAB: Reading a very large text file of an almost regular data with empty value

fgetlfscanfread datatext filetextreadtextscan

Hello everybody,
I am trying to import an almost regular matrix into matlab. I used textscan with EmptyValue option to do it.
But it always give a error message 'badly formated string'. I do not understand why. Could you please give me a hand.
Below is the data file. The problem with the text file is:
first, there is empty in it. It would be better if I can get a NaN or 0 to replace the empty at the end
Second, between the column 3 and 4, sometimes, the values are attached. It also makes the input difficult.
4.417E-03 1.000E+00 2.200E+05 462 2.543878E+00 5.440884E+01
4.417E-03 1.000E+00 2.200E+05 468 2.544193E+00 7.315421E+01
4.417E-03 1.000E+00 2.200E+05 687 2.255183E+00 5.011286E+01
4.417E-03 1.000E+00 2.200E+05 943 7.015397E+00
4.417E-03 1.000E+00 2.200E+05 947 1.877077E+01
4.417E-03 1.000E+00 2.200E+0511135 2.543452E+00
4.417E-03 1.000E+00 2.200E+0511138
4.417E-03 1.000E+00 2.200E+0511141
4.417E-03 1.000E+00 2.200E+0511144 2.543891E+00 4.701584E+01
4.417E-03 1.000E+00 2.200E+0511351 2.255163E+00 4.291446E+01
4.417E-03 1.000E+00 2.200E+05 1591 2.544160E+00 2.182716E+01
4.417E-03 1.000E+00 2.200E+05 1596 2.543892E+00 3.667904E+01
4.417E-03 1.000E+00 2.200E+05 1598
4.417E-03 1.000E+00 2.200E+05 2350
4.417E-03 1.000E+00 2.200E+05 2356
4.417E-03 1.000E+00 2.200E+05 2522
4.417E-03 1.000E+00 2.200E+05 2711
The matrix I wanna obtain is
4.417E-03 1.000E+00 2.200E+05 462 2.543878E+00 5.440884E+01
4.417E-03 1.000E+00 2.200E+05 468 2.544193E+00 7.315421E+01
4.417E-03 1.000E+00 2.200E+05 687 2.255183E+00 5.011286E+01
4.417E-03 1.000E+00 2.200E+05 943 7.015397E+00 NaN
4.417E-03 1.000E+00 2.200E+05 947 1.877077E+01 NaN
4.417E-03 1.000E+00 2.200E+05 11135 2.543452E+00 NaN
4.417E-03 1.000E+00 2.200E+05 11138 NaN NaN
4.417E-03 1.000E+00 2.200E+05 11141 NaN NaN
4.417E-03 1.000E+00 2.200E+05 11144 2.543891E+00 4.701584E+01
4.417E-03 1.000E+00 2.200E+05 11351 2.255163E+00 4.291446E+01
4.417E-03 1.000E+00 2.200E+05 1591 2.544160E+00 2.182716E+01
4.417E-03 1.000E+00 2.200E+05 1596 2.543892E+00 3.667904E+01
4.417E-03 1.000E+00 2.200E+05 1598 NaN NaN
4.417E-03 1.000E+00 2.200E+05 2350 NaN NaN
4.417E-03 1.000E+00 2.200E+05 2356 NaN NaN
4.417E-03 1.000E+00 2.200E+05 2522 NaN NaN
4.417E-03 1.000E+00 2.200E+05 2711 NaN NaN
Ps. the file is very large. Data contains thousands of rows and columns. Should I do some optimisation for reading files? Thanks in advance to help me. Thank you very much.
[EDITED]
Format='%*10E %10E %9f %5d %E %E'
opt = {'EmptyValue',NaN,'CollectOutput',1};
tmp = textscan(fid,Format,'Delimiter','','Whitespace','',opt{:});

Best Answer

Since your matrix is just almost regular, you will not be able to work with textscan easily. I guess the problem with EmptyValue is that there are no delimiters for the empty fields in your file.
I would suggest to use fgetl, and extract digits from the resulting char array with regexp. Then you get a cell array of char arrays, which you can convert to doubles.
For example:
clear
fid = fopen('test.txt','rt');
% maximum number of columns
maxlength = 6;
% preallocation
step = 2;
tm = nan(step,maxlength);
% reading line by line
k = 1;
while ~feof(fid)
thisline = fgetl(fid);
thisline = regexp(thisline,'\d*','match');
thisline = cellfun(@str2num,thisline);
tm(k,1:length(thisline)) = thisline;
r = size(tm,1);
% need to preallocate more?
if k == r
tm((r+1):(r+step), 1:maxlength) = nan;
end
k = k + 1;
end
fclose(fid);
For large files it makes sense to increase step.