Solved – Interpret the visualization of k-mean clusters

clusteringdata visualizationk-means

Following my posted data here, I conducted a k-mean clustering analysis.
I refereed to this post: How to produce a pretty plot of the results of k-means cluster analysis? for the clusters visualization

# Read and Sort Input Data
mydata <- read.csv(file="three_county_6_25.csv", head=TRUE, sep=",") # read input data
mydata2 <- scale(mydata)  # Normalize the data

# Determine number of clusters
wssplot(mydata2)
set.seed(1234)
nc <- NbClust(mydata2, min.nc=2, max.nc=15, method="kmeans")
table(nc$Best.n[1,])

# Do K-means clustering
set.seed(1234)
fit.km <- kmeans(mydata2, centers = 3, nstart=25)

 # Visualize the clusters
 # Fig 1
 plotcluster(mydata2, fit.km$cluster)
 # Fig 2
 clusplot(mydata2, fit.km$cluster, color=TRUE, shade=TRUE,labels=2, lines=0)
 # Fig 3
 with(mydata, pairs(mydata2, col=c(1:3)[fit.km$cluster]))

The NbClust indicates 2 clusters:
enter image description here

Here are the visualization of clusters:enter image description here

I am not sure how to interpret the clusters visualization result.
1) The 1st cluster plot is doing "Centroid Plot against 1st 2 discriminant functions". It seemed the clusters showed three groups.
2) The 2nd cluster plot "vary parameters for most readable graph" (referred from Quick-R: Cluster Analysis).

Best Answer

The data contains correlations.

k-means cannot handle correlations, and failed badly.

Either split your data manually based on the visualization (the left looks reasonable), or use a different algorithm capable of handling linear elongated clusters.