Solved – Cluster analysis with K-means. How to get the cluster representatives

bioinformaticsk-meansr

I am trying to do some multivariate cluster analysis as follows:
I have a file in which I have the data and I perform the cluster analysis using k-means:

data <- read.csv("data_file")

str(data)

'data.frame':   10 obs. of  3 variables:

 $ A       : num  2.64 2.01 2.02 1.85 1.94 ...
 $ B       : num  5.45 5.14 5.16 4.82 4.92 ...
 $ C       : num  7.58 7.66 7.74 7.57 7.52 ...


data2 <- scale(data)

fit1 <- kmeans(data2, 3) 

fit1$cluster

[1] 2 2 2 1 1 1 3 3 3 3

fit1$center

   A           B          C

1  0.1524144  -1.0545162  0.5133913
2  1.0523695   0.8632014  0.9234564
3 -0.9035879   0.1434861 -1.0776358

Now, I have the three clusters and for each cluster I have the centroids coordinates. I would like now to have a representative item for each cluster. It is important that the representative item is part of the data.
So, what I thought is to calculate the distance of each item of the data from the centroid of each cluster and choose as 'representative for a cluster X' the item with the minimum distance from the centroid of cluster X.

I have already read this useful answer
but I am having troubles adapting it to my case.

I was thinking of adding a column to the centroids-matrix to assign a name to the cluster (such as: a, b, c….) and then going on as the other answer suggests, but unfortunately I am not going anywhere.. just getting errors.

Best Answer

The obvious choice for a representative from the original data with k-means would of course be the object closest to the cluster center.

However, if you have this objective, you probably should be using PAM instead of k-means in the first place, because with PAM optimizes the deviation from a data point. Results by PAM are therefore expected to be better.