Solved – How to use Gower’s Distance with clustering algorithms in Python

clusteringgower-similarityk-meansmachine learningpca

I am trying to cluster by dataset with mixed features using k-means. As a distance metric, I am using Gower's Dissimilarity. I want to ask 2 things:

-Is k-means an appropriate algorithm that can accept Gower's matrix results as an input? or how can I use the output of Gower' matrix as an input of another clustering algorithm?

-Can I apply PCA or Kernel PCA on the results of Gower's Discance to decrease dimensionality so that I can visualize the results?

Best Answer

K-means does not use a distance matrix.

The method requires a data matrix, because it computes the mean. It nowhere uses pairwise distances, but only "point to mean" distances. The mean is a good choice for squared Euclidean distance. It's not particularly good for regular Euclidean. It's only defined for continuous variables. So it cannot be used with Gower's on categoricial data.

If you have a distance matrix (and little enough data to store it), then hierarchical clustering is likely the method of choice.

Yes, it probably is a good idea to use non-metric multidimensional scaling (MDS) and tSNE to check if the distance function works on your data. There is no guarantee that a distance gives useful results. If these visualization just give you a random-like blob, then do not expect the data to cluster with this distance.