Solved – Silhouette Score and Dimensionality reduction

clusteringdimensionality reduction

I have observed that when I significantly reduce the dimensionality of my data that the silhouette score drastically increases. I have reduced the dimensionality so that only 10% of the variance is retained.

With no dimensionality reduction, I get on average silhouette scores ~0. With dimensionality reduction, only keeping 10% variance, I get a score of ~.78.

Based on the silhouette score, is the data actually better clustered in this low dimensionality, or have I manipulated the data too much for this score to be reliable?

Best Answer

Never compare silhouette scores of different preprocessing, in particular not of different features.

This is comparing apples and oranges.

If you want to see if the clusters after PCA are better, use the cluster labels with the original data for Silhouette.