MATLAB: Best set of elements to find closest mean.

best setfitkmeansmeanStatistics and Machine Learning Toolbox

Hello,
The title might be unclear but I didnt find any better.
– Let's have an array A of two columns filled with cooridnate values Ax and Ay. Those coordinates define points in a scatterplot. I use kmeans with two clusters that give me the coordinates of the two centroides Acx and Acy. Let's have now an array B with similarly two columns of values Bx and By.
I want to find how to separate the points in B in two sets so that the mean of each set is as close as possible of the centroids Axc and Ayc. I started thinking about a way using minimum distance but it seems like de result can be wrong… The result may be a single set if the points are too far from the other centroid. Since I need to repeat the process a huge number of time I'd like it to be as fast as possible.
Any help or clue is welcome. KeFop
[EDIT] Here to explain my point:
close all
clear
%points with obvious two clusters
Apts = rand(10,2);
Bpts = rand(10,2);
Bpts(:,1) = Bpts(:,1)+2;
Total = cat(1,Apts, Bpts);
%adding one point at the edge of the two clusters
Sp = mean(Total)*1.2;
Total = cat(1, Total, Sp);
%let's add the centroïds, imported from a previous classification, nearby the means
%of the clusters
[~, ClustMean] = kmeans([Total(:,1), Total(:,2)], 2, 'MaxIter', 100, 'Display', 'off','Replicates', 3);
centroids = ClustMean;
centroids(:,1) = centroids(:,1)+0.2;
%Define best set of points of mean as close as possible of the imported
%centroids


for t = 1:length(Total)
for c = 1:2
EuclDist(t,c) = sqrt(sum((Total(t,1)- centroids(c,1)).^2 + (Total(t,2)- centroids(c,2)).^2));
end
[~, ClustSelect(t)] = min(EuclDist(t,:));
end
figure
hold all
%points with colors depending on cluster

scatter(Total(ClustSelect==1,1), Total(ClustSelect==1,2), 'r');
scatter(Total(ClustSelect==2,1), Total(ClustSelect==2,2), 'b');
%mean of the two clusters
scatter(ClustMean(1,1), ClustMean(1,2), 60, 'r', 'filled');
scatter(ClustMean(2,1), ClustMean(2,2), 60, 'b', 'filled');
%centroids
scatter(centroids(1,1), centroids(1,2), 'g', 'd', 'LineWidth', 10);
scatter(centroids(2,1), centroids(2,2), 'g', 'd', 'LineWidth', 10);
legend('Points of cluster A', 'Points of cluster B','Mean of cluster A','Mean of cluster B', 'Importate centroid', 'Importate centroid', 'Location', 'northeastoutside');
%Now if manually switching the point in the middle from one cluster to the
%other
ClustSelect2 = ClustSelect;
if ClustSelect2(end) == 1
ClustSelect2(end) = 2;
elseif ClustSelect2(end) == 2
ClustSelect2(end) = 1;
end
%recalculate the new means
MeanClustA = mean(Total(ClustSelect2==1,:));
MeanClustB = mean(Total(ClustSelect2==2,:));
figure
hold all
%points with colors depending on cluster
scatter(Total(ClustSelect2==1,1), Total(ClustSelect2==1,2), 'r');
scatter(Total(ClustSelect2==2,1), Total(ClustSelect2==2,2), 'b');
%mean of the two modified clusters
scatter(MeanClustA(1,1), MeanClustA(1,2), 60, 'r', 'filled');
scatter(MeanClustB(1,1), MeanClustB(1,2), 60, 'b', 'filled');
%centroids
scatter(centroids(1,1), centroids(1,2), 'g', 'd', 'LineWidth', 10);
scatter(centroids(2,1), centroids(2,2), 'g', 'd', 'LineWidth', 10);
legend('Points of cluster A', 'Points of cluster B','Mean of cluster A','Mean of cluster B', 'Importate centroid', 'Importate centroid', 'Location', 'northeastoutside');

Best Answer

Simply use sqrt(). So you have two clusters. One cluster has a centroid at (acx1, acy1), and the other cluster is centered at (acx2, acy2). To find the two sets of B that will have the closest means, simply assign the points in B to whatever centroid of A they're closest to. Let's say you have two arrays bx and by which have the x and y coordinates of points in set B. Try this untested code:
distancesToACluster1 = sqrt((bx-acx1).^2 + (by-acy1).^2);
distancesToACluster2 = sqrt((bx-acx2).^2 + (by-acy2).^2);
% Find out which elements are closest to A centroid #1:
closestTo1 = distancesToACluster1 < distancesToACluster2;
% Find out which elements are closest to A centroid #2:
closestTo2 = ~closestTo1;
% Extract points from b into set 1
bx1 = bx(closestTo1);
by1 = by(closestTo1);
% Extract points from b into set 2
bx2 = bx(closestTo2);
by2 = by(closestTo2);
Related Question