Following my posted data here, I conducted a k-mean clustering analysis.
I refereed to this post: How to produce a pretty plot of the results of k-means cluster analysis? for the clusters visualization
# Read and Sort Input Data
mydata <- read.csv(file="three_county_6_25.csv", head=TRUE, sep=",") # read input data
mydata2 <- scale(mydata) # Normalize the data
# Determine number of clusters
wssplot(mydata2)
set.seed(1234)
nc <- NbClust(mydata2, min.nc=2, max.nc=15, method="kmeans")
table(nc$Best.n[1,])
# Do K-means clustering
set.seed(1234)
fit.km <- kmeans(mydata2, centers = 3, nstart=25)
# Visualize the clusters
# Fig 1
plotcluster(mydata2, fit.km$cluster)
# Fig 2
clusplot(mydata2, fit.km$cluster, color=TRUE, shade=TRUE,labels=2, lines=0)
# Fig 3
with(mydata, pairs(mydata2, col=c(1:3)[fit.km$cluster]))
The NbClust indicates 2 clusters:
Here are the visualization of clusters:
I am not sure how to interpret the clusters visualization result.
1) The 1st cluster plot is doing "Centroid Plot against 1st 2 discriminant functions". It seemed the clusters showed three groups.
2) The 2nd cluster plot "vary parameters for most readable graph" (referred from Quick-R: Cluster Analysis).
Best Answer
The data contains correlations.
k-means cannot handle correlations, and failed badly.
Either split your data manually based on the visualization (the left looks reasonable), or use a different algorithm capable of handling linear elongated clusters.