Solved – R getting 2D coordinates from kmeans

data visualizationk-meansr

I performed and plotted a kmeans analysis in R with the following commands:

 km = kmeans(t(mat2), centers = 4)
 plotcluster(t(mat2), km$cluster)      #from library(fpc)

Here is the result from the plot: kmeans clustering

This question is related to a previous question: Previous Question

My data matrix has dimensions $291 \times 31$ (after taking the transpose by t(mat2))
What I want to know, is how can I create a mapping from each row in the matrix to a 2D point in the plot? My idea is to get the $31$ dimensional coordinates for each point in the plot and then map and compute the 2D coordinates with discrproj().For example, I see that I should be able to find the 2D center points of all clusters by calling discrproj() on the matrix given by km$centers (which has dimensions $4 \times 31$ and hence contains the coordinates for each cluster in $31$ dimensional space).

However, where is the data for the coordinates in $31$ dimensional space for every 2D point in the plot? Is this data just my $291 \times 31$ data matrix? In summary:

  1. How can I create a mapping from each row in the $291 \times 31$ data matrix to a 2D point in the plot?
  2. Where/what is the data for the coordinates in $31$ dimensional space for every 2D point in the plot

Best Answer

First, let's generate some example data and cluster it:

data <- rFace(1000) 
km <- kmeans(data, 6) 

Now, we can use discrproj to find an appropriate projection that separates these clusters

dp = discrproj(data, km$clustering)

The result, dp has several fields that are potentially useful. The field dp$proj contains the coordinates of the original data points, projected onto our new space. This space has the same dimensionality as the original space, but the first two dimensions separate the clusters best (which is what plotcluster actually displays)

Compare:

plot(dp$proj[,1], dp$proj[,2], pch=km$cluster+48, col=km$cluster) #+48 to get labels correct

with:

plotcluster(data, km$clustering)

Suppose you get some new points in your original space. You can project them into your new space using the basis vectors in dp$units, like this:

newpts = newdata %*% dp$units[,1:2]

That should answer your first question. Unfortunately, I think the second part is effectively unanswerable because there are infinitely many points in the 31-d space that correspond to a given point in the 2D space.