The Task: a 1-D column array of coordinates exists. Most of those coordinates repeat more than once.
1. look for the most repeated element; 2. assign a cluster id to it; 3. include coordinates that lie in +/-10 units to the same cluster 4. remove clustered values from the array 5. proceed with the second most repeated element and loop the same steps as above 6. do the procedure until coordinates are not repeated
Here is a sample code:
function [clusterArray,frequencyArray] = ClusterWithModeSeed(inputArray) % Cluster vector elements to seeds determined by elements frequency
% element should be repeated at least two times to be considered in
% a cluster
% Function Input
inputSize=size(inputArray,1); index=(1:inputSize)'; clusterArray=zeros(inputSize,1); cluster=0; frequencyArray=zeros(inputSize,1); freq=2; while freq>1 % Update cluster ID
cluster=cluster+1; % Calculate Mode And Frequency
[mfv,freq]=mode(inputArray); % All Elements In +/-10 range of the most frequent value
tfCluster=inputArray>=mfv-10&inputArray<=mfv+10; % Set Cluster % Frequency Outputs
clusterArray(index(tfCluster))=cluster; frequencyArray(cluster)=sum(tfCluster); % Update Index And Data
inputArray(tfCluster)=[]; index(tfCluster)=[]; end % Remove Empty Frequency
frequencyArray(frequencyArray==0)=[]; end
Example Input Array : ans =
193974 2140429 2140432 2140437 2140442 2249750 2253106 2253106 2269479 2269980 2276359 2276359 2276365 2276365 2276359 2276359 2276365 2276359 2276359 2276359 2276359 2276359 2278750 2282743 2282756
How can I optimize the above algorithm? It uses arrays with around 1E6 Elements and takes around 15-20sec. Any comments are appreciated, thank you for your time!
Best Answer