MATLAB: Improve the performance of a function based on str2double

MATLABperformanceregexpstr2double

Hi all, I have a function that given a line of text coming from a TXT file containing information of the type:
LINE1: N1 A2 X5.45 Y4.45 Z-10.25 ;TEXT
LINE2: N3 A3 X1.45 ;TEXT
After the ;TEXT there could be more information of the same type that would not have to be taken into account, for example:
LINE3: N1 A2 X-5.5 Y9.35 Z-1.5 ;X25 Y-4.44
I give two example lines to try to show that not all lines always contain the same information.
And what I want to obtain is in a matrix (for example A) the information that appears after X, Y or Z and NaN if it does not contain that information. For the example A should be:
A = [5.45 4.45 -10.25
1.45 NaN NaN
-5.5 9.35 -1.5];
The function I am using is the one shown in coordinatesCHAR by entering in tline the line of text in question and in matchWords a cell that would be for this case: matchWords = {'X','Y','Z'};
When the number of lines is low, the processing time is relatively high, but of course, the text files I am working with have some thousands of lines and it is not productive.
I was able to verify that the slowest functions were str2double and regexp. Does someone know how can I improve this?
function XYZ = coordinatesCHAR(tline,matchWords)
% Regulor expression to find matchcase letter.
[a,b] = regexp(tline,'[+-]?\d+(\.\d+)?');
XYZ = NaN(1,length(matchWords));
for ii = 1:length(matchWords)
isfind = strfind(tline,matchWords{ii});
if ~isempty(isfind) && ~isempty(a) && ~isempty(b)
% If isfind has more than one component take the first position
strPos = find(a == isfind(1)+1);
if isempty(strPos)
XYZ(1,ii) = NaN;
else
XYZ(1,ii) = str2double(tline(a(strPos):b(strPos))); % Get the value upto next character
end
end
end
I searched in different forums and tried using the "str2doubleq" function, but the improvement was minimal.
Thank you so much for all.

Best Answer

"Deadahead" solution without any attempt to use anything fancy...regular expressions are known to be expensive; I've never compared/timed relative to the new string functions to know where they stack up...
function ret=coordinatesCHAR(tline,vars)
% for input line beginning with text, may have trailing comments after semicolon
ret=nan(1,numel(vars));
if contains(tline,';'), tline=extractBefore(tline,';'); end
t=split(tline);
t=t(contains(t,vars));
v=cellfun(@(s)sscanf(s(2:end),'%f'),t);
ix=contains(vars,cellfun(@(s)s(1),t,'uni',0));
ret(ix)=v;
end
For the sample
>> A=[];
>> for i=1:numel(txt),A=[A;coordinatesCHAR(txt(i),vars)];end
>> A
A =
5.4500 4.4500 -10.2500
1.4500 NaN NaN
-5.5000 9.3500 -1.5000
>>
Revised above tested with
> txt
txt =
3×1 cell array
{'LINE1: N1 A2 X5.45 Y4.45 Z-10.25 ;TEXT' }
{'LINE2: N3 A3 X1.45 ' }
{'LINE3: N1 A2 X-5.5 Y9.35 Z-1.5 ;X25 Y-4.44'}
>>
w/o the trailing semicolon. The leading "LINE" is immaterial, actually; just has a little longer string this way but the logic still works.
>> txt
txt =
3×1 cell array
{'N1 A2 X5.45 Y4.45 Z-10.25 ;TEXT' }
{'N3 A3 X1.45' }
{'N1 A2 X-5.5 Y9.35 Z-1.5 ;X25 Y-4.44'}
>> A=[];
>> for i=1:numel(txt),A=[A;coordinatesCHAR(txt(i),vars)];end
>> A
A =
5.4500 4.4500 -10.2500
1.4500 NaN NaN
-5.5000 9.3500 -1.5000
>>