Solved – If I use PCA before clustering, do I need to use PCA scores (principal components) to run the clustering

clusteringpca

I want to use PCA before clustering, and then I want to run a clustering algorithm such as K-Means.

My understanding is that I run PCA and find loadings for each original variable, then calculate scores for each record with linear combinations of row values multiplied by each PC loadings, then run clustering on the calculated PCA scores.

Is it correct or do I need to do more before to run clustering on them?

Best Answer

PCA decomposes the covariance matrix into rotation and scaling.

If you only use rotation, you should get the exact same result with k-means. So you gained nothing.

Two ways of using the scaling information:

  1. scale every projected attribute to unit variance
  2. discard attributes with low variance
  3. both.