MATLAB: Reading a N columns table which sometimes have N+1 columns

MATLABperformancetextscan

Hi Everyone,
I search a lot in the forum without findinding a solution. And pardon my English I am French 🙂
My problem is the following:
I have a log file with a lot of information. The log file could be up to 500Mo (even bigger sometimes). This log file is seperated in 2 main parts. A header part which is easy and fast to retrieve info line by line, and a data part.
The data part is composed of several tasks with a table of data with text before and after.
The data table structure is the following :
1 25.870 1.000 lhc 0.000 0.000 -20.140 24.449 1.42061512
2 25.870 1.000 lhc 0.000 0.000 -20.520 24.519 1.35075912
3 25.870 1.000 lhc 0.000 0.000 -20.951 24.582 1.28833133
4 25.870 1.000 lhc 0.000 0.000 -21.434 24.638 1.23204173
5 25.870 1.000 lhc 0.000 0.000 -21.958 24.689 1.18086597
6 25.870 1.000 lhc 0.000 0.000 -22.503 24.735 1.13498198
7 25.870 1.000 lhc 0.000 0.000 -23.148 24.781 1.08854135
8 25.870 1.000 lhc 0.000 0.000 -23.741 24.824 1.04623596
9 25.870 1.000 lhc 0.000 0.000 -24.244 24.863 1.00744521
10 25.870 1.000 lhc 0.000 0.000 -24.626 24.898 0.97159033
11 25.870 1.000 lhc 0.000 0.000 -24.876 24.932 0.93839531
12 25.870 1.000 lhc 0.000 0.000 -25.010 24.962 0.90779039
13 25.870 1.000 lhc 0.000 0.000 -25.057 24.990 0.87971152
14 25.870 1.000 lhc 0.000 0.000 -25.063 25.016 0.85443812
15 25.870 1.000 lhc 0.000 0.000 -25.072 25.038 0.83238819
16 25.870 1.000 lhc 0.000 0.000 -25.115 25.056 0.81396378
17 25.870 1.000 lhc 0.000 0.000 -25.220 25.070 0.79981872
18 25.870 1.000 lhc 0.000 0.000 -25.406 25.079 0.79060410
19 25.870 1.000 lhc 0.000 0.000 -25.611 25.078 0.79173920
20 25.870 1.000 lhc 0.000 0.000 -25.936 25.068 0.80208976
21 25.870 1.000 lhc 0.000 0.000 -26.373 25.047 0.82291587
22 25.870 1.000 lhc 0.000 0.000 -26.891 25.014 0.85576164
23 25.870 1.000 lhc 0.000 0.000 -27.437 24.969 0.90124460
24 25.870 1.000 lhc 0.000 0.000 -27.928 24.910 0.96048807
25 25.870 1.000 lhc 0.000 0.000 -28.254 24.835 1.03468974
26 25.870 1.000 lhc 0.000 0.000 -28.317 24.746 1.12353854
27 25.870 1.000 lhc 0.000 0.000 -28.070 24.642 1.22847010
28 25.870 1.000 lhc 0.000 0.000 -27.662 24.552 1.31821801
29 25.870 1.000 lhc 0.000 0.000 -27.101 24.452 1.41784749
30 25.870 1.000 lhc 0.000 0.000 -26.466 24.343 1.52711338
31 25.870 1.000 lhc 0.000 0.000 -25.820 24.224 1.64568471 **
As you can see in line 31, ** appears randomly as a 6th column. This is just a part of the data it goes for thousand of lines.
I am using the following code to retrieve those data. It works fine but I have performance problem with big file. It takes too long. Do you have a solution to help me improve performances ? My problem if the interruption cause by these **. The more I have the slower it gets.
Where fid is the identication of current file opened
% Store all the file in one variable in order to find line of begining and end of tasks and
% doing more quickly research
outFile = textscan(fid, '%s', 'Delimiter', '\n');
frewind(fid);
%Variable
taskSummaryFlagOn='No. goal weight pol. rot. att. 1. comp. 2. comp. residue';
taskSummaryFlagOff='Maximum of 1. component:';
% Find the rows where tasks results are
needle=strfind(outFile{1}, taskSummaryFlagOn);
rowsStartTask= find(~cellfun('isempty', needle));
needle=strfind(outFile{1}, taskSummaryFlagOff);
rowsEndTask= find(~cellfun('isempty', needle));
nbStartLine=0;nbEndLine=2;
%PreAllocation of the variable for better performances
dataSimu=cell(max(size(nbLineData)),9);
nbLineData=zeros(max(size(rowsStartTask)),1);% nbLineData will be to ensure that all the data are correctly retrieve
% Loop
for i=1:max(size(rowsStartTask))
nbLineData(i)=rowsEndTask(i)-rowsStartTask(i)-nbStartLine-nbEndLine;
dataSimu(i,:)=textscan(fid,'%f %f %f %s %f %f %f %f %f','headerlines', rowsStartTask(i));
% Exception when the line of data finish with **
while size(dataSimu{i,1},1)~=nbLineData(i)
fgetl(fid);% reading the final '**'
buff=textscan(fid,'%f %f %f %s %f %f %f %f %f');
for j=1:max(size(buff))
dataSimu{i,j}=[dataSimu{i,j};buff{:,j}];
end
end
frewind(fid);
end
If you need more information to understand my problem, I will provide you more details.
Thanks for the time you will spend to help me 🙂

Best Answer

Assuming you don't need the '**' info, you could try this solution from the fscanf examples which skips the remainder of the line after the data you expect:
dataSimu(i,:)=textscan(fid,'%f %f %f %s %f %f %f %f %f %*[^\n]'','headerlines', rowsStartTask(i));