MATLAB: Fastest match between 2 sequences

MATLABsequence matching alignment pattarns matching

Hi All,
I am matching many long sequences against each other in a way that the shorter sequence x slides against the longer reference sequence r in a search for the best match: the number of matching symbols (characters).
Here is my naive looped code that finds the best match. Do you have any suggestions on how to turn it into the fastest possible (vectorized) code? There might be a number of same sized shorter vectors x coming as inputs, so if possible to exploit it in the vectorization it would be ideal. I know this could be done with convolution on the binary vectors but my vectors are character symbols. Any ideas would be much appreciated.
%Finds the best match between shorter sequence x sliding against a longer reference sequence r
%The slide starts from the last element x on top of the 1st element of r:
%start (i=-2) 'THE'->
% 'ABCFGHETYUI'
%best match (i=4): 'THE'
%match score s is the number of elements in x that match (align) with r.
%match position i could be from -m+1:n+m-2 i.e. takes n+m-1 possible values
function [i,s]=find_seq(x,r)
m=numel(x); n=numel(r);
for i=1:m-1 %Counts matches in the left partial overlap
s(i)=sum(x(end-i+1:end)==r(1:i));
end
for i=1:n-m+1 %Counts matches when x fully overlaps with r
s(i+m-1)=sum(x==r(i:i+m-1));
end
for i=1:m-1 %Counts matches in the right partial overlap
s(n+i)=sum(x(1:m-i)==r(n-m+i+1:end));
end
[s,i]=max(s); i=i-m+1;

Best Answer

Another approach:
function [i, s] = find_seq_v3(x, r)
m = numel(x);
n = numel(r);
s = zeros(1, n + m - 1);
for i = 1:m-1 % left partial overlap
s(i) = sum(x(m-i+1:m) == r(1:i));
end
% Original: Long loop over r:
% for i = 1:n-m+1 % fully overlaps with r
% s(i+m-1) = sum(x == r(i:i+m-1));
% end
% Faster: Short loop over x:
q = 0; % full overlap
for i = 1:m
q = q + (x(i) == r(i:n-m+i));
end
s(m:n) = q;
for i = 1:m-1 % right partial overlap
s(n+i) = sum(x(1:m-i) == r(n-m+i+1:n));
end
[s,i] = max(s);
i = i - m + 1;
end
Speed comparison:
  • Elapsed time is 2.224595 seconds. Original
  • Elapsed time is 0.802787 seconds. Pre-allocation & UINT8
  • Elapsed time is 0.184660 seconds. Pre-allocation & UINT8 & Short loop
Just 8% of the original runtime. The FFT method of the paper might be magically fast. But some small changes can accelerate the naive implementation remarkably already.
I'm going to mex this tomorrow.