MATLAB: Euclidean distance-based clustering with predetermined number of members

classificationclusteringMATLABStatistics and Machine Learning Toolbox

Hello to you all
I have a data point that contains points in the 2D coordinate, and I want to cluster these points based on the minimum distance between them to the K group. Each cluster will have a predetermined number of members, for example, five members, like the following picture. Note that remained data points, will be unclustered.
Is there any function at Matlab that help me?

Best Answer

You could easily ? write your own loop to do it. Just start with the closest pair of points and keep assigning nearby neighbors to that cluster until you reach the required number of neighbors in the cluster. Then increment the cluster number of repeat to find the next cluster. Keep going until all points are gone (used up).
clc; % Clear the command window.
fprintf('Beginning to run %s.m.\n', mfilename);
close all; % Close all figures (except those of imtool.)
clear; % Erase all existing variables. Or clearvars if you want.
workspace; % Make sure the workspace panel is showing.
format long g;
format compact;
fontSize = 22;
numPoints = 200;
pointsPerCluster = 5;
numClusters = ceil(numPoints / pointsPerCluster)
coordinates = zeros(pointsPerCluster, 2, numClusters);
t = table(zeros(numPoints, 1), zeros(numPoints, 1), zeros(numPoints, 1), zeros(numPoints, 1), 'VariableNames', {'ClusterNumber', 'PointNumber', 'x', 'y'});
xy = rand(numPoints, 2);
x = xy(:, 1);
y = xy(:, 2);
subplot(1, 2, 1);
plot(x, y, 'b.', 'MarkerSize', 14);
grid on;
% Get distances of every point to every other point


distances = pdist2(xy, xy);
minDistance = min(distances(distances~=0))
[row, col] = find(distances == minDistance)
currentRow = row(1) % Get first point.

pointer = 1;
for k = 1 : numClusters
% Get distances of every point to every other point
distances = pdist2(xy, xy(currentRow, :));
% Find the closest points.
minDistances = mink(distances, pointsPerCluster);
[ia, ib] = ismember(distances, minDistances);
rows = find(ib);
% Store these coordinates as cluster #k
for n = 1 : length(rows)
t.ClusterNumber(pointer) = k;
t.PointNumber(pointer) = n;
t.x(pointer) = xy(rows(n), 1); % Store x value.
t.y(pointer) = xy(rows(n), 2); % Store y value.
pointer = pointer + 1;
end
if pointer >= numPoints
break; % Quit when all points are used up
end
% Set the current row coordinates to infinity so we know not to consider (use) them again.
xy(rows, :) = inf;
% Get new cluster -- new starting point.
% Get distances of every point to every other point
distances = pdist2(xy, xy);
minDistance = min(distances(distances~=0));
[row, col] = find(distances == minDistance);
currentRow = row(1); % Get first point.
end
% Show clusters in unique colors
subplot(1, 2, 2);
gscatter(t.x, t.y, t.ClusterNumber);
Be aware that as points in closely located clusters get used up, the points available for remaining clusters will be more spread out. I think that's obvious though, right? For example if there are only 2 clusters in 1-D, if your values are [1,3,5, 61,62,63,64,65, 99,100] then cluster #1 will be [61,62,63,64,65] (tightly grouped) and cluster %2 will be [1,3,5, 99,100] (widely spaced).