Solved – Kmeans clustering results on pca dataset reduction

k-meanspca

I have a set of 321 observations of 18 correlated variables, so I do PCA to extract a low dimensional set of features from this high dimensional data set. I select 9 of 18 components (the number of components that explains 80% of total variance) After determining the number of clusters with NbClust, apply k-means clustering to do the classification.

img

I am using the PCA for dimensionality reduction in order to reduce the complexity of my problem, given an interpretation to all the components.

My Question: Why are the clusters differentiated only in PC1-other Component plane (example PC1-PC2 plane, PC1-PC3 plane, etc…)?
How can I solve this problem?

Best Answer

Components are ordered according to how much variability your data display on each of them. So the points on the opposite ends of the first component are farther away from each other compared with data points on the opposite ends of some other component. K-means works by looking at distances between points. When two points are on the opposite end of PC1 projection - their difference is a lot bigger compared to when they are on different ends on, say, PC8 projection. This is not a problem.

Related Question