MATLAB: Improve the performance of a function based on str2double

Hi all, I have a function that given a line of text coming from a TXT file containing information of the type:

    LINE1:    N1 A2 X5.45 Y4.45 Z-10.25 ;TEXT
    LINE2:    N3 A3 X1.45 ;TEXT

…

After the ;TEXT there could be more information of the same type that would not have to be taken into account, for example:

    LINE3:    N1 A2 X-5.5 Y9.35 Z-1.5 ;X25 Y-4.44

I give two example lines to try to show that not all lines always contain the same information.

And what I want to obtain is in a matrix (for example A) the information that appears after X, Y or Z and NaN if it does not contain that information. For the example A should be:

A = [5.45 4.45 -10.25
    1.45 NaN NaN
    -5.5 9.35 -1.5];

The function I am using is the one shown in coordinatesCHAR by entering in tline the line of text in question and in matchWords a cell that would be for this case: matchWords = {'X','Y','Z'};

When the number of lines is low, the processing time is relatively high, but of course, the text files I am working with have some thousands of lines and it is not productive.

I was able to verify that the slowest functions were str2double and regexp. Does someone know how can I improve this?

function XYZ = coordinatesCHAR(tline,matchWords)
% Regulor expression to find matchcase letter.
[a,b]  = regexp(tline,'[+-]?\d+(\.\d+)?');
XYZ = NaN(1,length(matchWords));
for ii = 1:length(matchWords)
    isfind = strfind(tline,matchWords{ii});
    if  ~isempty(isfind) && ~isempty(a) && ~isempty(b)
        % If isfind has more than one component take the first position
        strPos = find(a == isfind(1)+1);
        if isempty(strPos)
            XYZ(1,ii) = NaN;
        else
            XYZ(1,ii) = str2double(tline(a(strPos):b(strPos)));   % Get the value upto next character
        end
    end
end

I searched in different forums and tried using the "str2doubleq" function, but the improvement was minimal.

Thank you so much for all.

function ret=coordinatesCHAR(tline,vars) % for input line beginning with text, may have trailing comments after semicolon ret=nan(1,numel(vars)); if contains(tline,';'), tline=extractBefore(tline,';'); end t=split(tline); t=t(contains(t,vars)); v=cellfun(@(s)sscanf(s(2:end),'%f'),t); ix=contains(vars,cellfun(@(s)s(1),t,'uni',0)); ret(ix)=v; end

>> txt txt = 3×1 cell array {'N1 A2 X5.45 Y4.45 Z-10.25 ;TEXT' } {'N3 A3 X1.45' } {'N1 A2 X-5.5 Y9.35 Z-1.5 ;X25 Y-4.44'} >> A=[]; >> for i=1:numel(txt),A=[A;coordinatesCHAR(txt(i),vars)];end >> A A = 5.4500 4.4500 -10.2500 1.4500 NaN NaN -5.5000 9.3500 -1.5000 >>

Best Answer

"Deadahead" solution without any attempt to use anything fancy...regular expressions are known to be expensive; I've never compared/timed relative to the new string functions to know where they stack up...

For the sample

>> A=[];
>> for i=1:numel(txt),A=[A;coordinatesCHAR(txt(i),vars)];end
>> A
A =
    5.4500    4.4500  -10.2500
    1.4500       NaN       NaN
   -5.5000    9.3500   -1.5000
>>

Revised above tested with

> txt
txt =
  3×1 cell array
    {'LINE1:    N1 A2 X5.45 Y4.45 Z-10.25 ;TEXT'    }
    {'LINE2:    N3 A3 X1.45 '                       }
    {'LINE3:    N1 A2 X-5.5 Y9.35 Z-1.5 ;X25 Y-4.44'}
>>

w/o the trailing semicolon. The leading "LINE" is immaterial, actually; just has a little longer string this way but the logic still works.

Best Answer

Related Solutions

MATLAB: Correlation between two row matrices

MATLAB: Accessing data from a file without storing it

Related Question