MATLAB: Optimizing Clustering of Array Elements

array manipulationsclusteringoptimizationion

The Task: a 1-D column array of coordinates exists. Most of those coordinates repeat more than once.
1. look for the most repeated element; 2. assign a cluster id to it; 3. include coordinates that lie in +/-10 units to the same cluster 4. remove clustered values from the array 5. proceed with the second most repeated element and loop the same steps as above 6. do the procedure until coordinates are not repeated
Here is a sample code:
function [clusterArray,frequencyArray] = ClusterWithModeSeed(inputArray)
% Cluster vector elements to seeds determined by elements frequency
% element should be repeated at least two times to be considered in
% a cluster
% Function Input
inputSize=size(inputArray,1);
index=(1:inputSize)';
clusterArray=zeros(inputSize,1);
cluster=0;
frequencyArray=zeros(inputSize,1);
freq=2;
while freq>1
% Update cluster ID
cluster=cluster+1;
% Calculate Mode And Frequency
[mfv,freq]=mode(inputArray);
% All Elements In +/-10 range of the most frequent value
tfCluster=inputArray>=mfv-10&inputArray<=mfv+10;
% Set Cluster % Frequency Outputs
clusterArray(index(tfCluster))=cluster;
frequencyArray(cluster)=sum(tfCluster);
% Update Index And Data
inputArray(tfCluster)=[];
index(tfCluster)=[];
end
% Remove Empty Frequency
frequencyArray(frequencyArray==0)=[];
end
Example Input Array : ans =
193974
2140429
2140432
2140437
2140442
2249750
2253106
2253106
2269479
2269980
2276359
2276359
2276365
2276365
2276359
2276359
2276365
2276359
2276359
2276359
2276359
2276359
2278750
2282743
2282756
How can I optimize the above algorithm? It uses arrays with around 1E6 Elements and takes around 15-20sec. Any comments are appreciated, thank you for your time!

Best Answer

Don't do this:
% Update Index And Data
inputArray(tfCluster)=[];
index(tfCluster)=[];
Changing the size of a big array takes time. Lots of it. So save the parts you need to get rid of and get rid of them at the end. Or set them to nan or something else your program can ignore.