MATLAB: Clusterization of data in 1-D vector

I have large logical vector looking as V = [0 0 0 0 0 0 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 …………..]

I need to find the position of each group of 1 (lets say – center of each group) but if two groups of ones are too close to each other (say, less than 3 zerros in between) I need to consider those groups as a single group. I.e. at the firs stage I need to find groups (bold-underlined elements) and then find the ceter element of each group (shift +/-1 element does not matter)

1st stage (clusterization): [0 0 0 0 0 0 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 …………..]

2nd stage (find a center of each cluster): [0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 …………..]

The way I implemented now is following: I do smoothing of the entire vector (it is couple million elements). The span is chousen to be equal of maximum expected lenght of the group and then I look for local maxima (islocalmax) with 'MinSeparation' of minimum distace between groups. It works, but really slow (I have 360×180 = 64800 of vectors – yes, it is LAT/LONG grid with ~10M elements in each vector)

Is any way to speed up this? I believe it should be some "textbook" examples of it!

A = [0 0 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 1 1 0 1 0 1 1 0 1 0 0 0 0 1 1 1 1]; % Length of each group of consecutive 1s T = table(); T.OnesLength = diff(find([0;A(:);0]==0))-1; T(T.OnesLength==0,:) = []; % Index of 1st '1' in each group of consecutive 1s T.OnesStart = find(diff([0;A(:)])==1); % Index of last '1' in each group of consecutive 1s T.OnesStop = T.OnesStart + T.OnesLength - 1; % Determine the number of 0s between consecutive 1s ZerosBetween = [T.OnesStart(2:end) - T.OnesStop(1:end-1); NaN]-1; disp(T) OnesLength OnesStart OnesStop __________ _________ ________ 3 4 6 3 9 11 6 18 23 2 29 30 1 32 32 2 34 35 1 37 37 4 42 45 % join groups of consecutive 1s with less than n zeros between. n = 3; joinGroups = ZerosBetween < n; t = find(diff([0;joinGroups])==1); f = find(diff([0;joinGroups])==-1); T.remove = false(height(T),1); for i = 1:numel(t) T.OnesStop(t(i)) = T.OnesStop(f(i)); T.OnesLength(t(i)) = sum(T.OnesLength(t(i):f(i))) + sum(ZerosBetween(t(i):f(i)-1)); T.remove(t(i)+1:f(i)) = true; end T(T.remove,:) = []; T.remove = []; disp(T) OnesLength OnesStart OnesStop __________ _________ ________ 8 4 11 6 18 23 9 29 37 4 42 45

Best Answer

There are lots of alternatives.

Input A is a vector of 1s and 0s.
n is minimum number of 0s between 1s separate groups of 1s.
T is a table showing the start and stop index for each consecutive group of 1s split by less than n zeros and the length of each group.

Now you can use the segment length and the start/stop indices to compute the segement centers.

Best Answer

Related Solutions

MATLAB: Counting consecutive occurences of 1s and -1s

MATLAB: Boolean Vector Length Calculation

Related Question