Solved – In cluster analysis, can you use Gower’s coefficient of similarity with a k-means clustering method

clusteringgower-similarityk-meansmixed type data

I am researching cluster analysis, and I am interested in variables that are both categorical and continuous, for which I have read that a Gower's similarity coefficient is a good proximity measure. I have read that Gower's similarity coefficient is generally not compatible with Ward's method, so I was planning to initially cluster using average linkage, but I was also seeking to compare the cluster structure (for content validity purposes) with another clustering method, specifically the k-means method, using the number of clusters and initial centers obtained in the average linkage method. Is Gower's coefficient of similarity a compatible proximity measure for k-means method?

Best Answer

K-means is really only sensible for squared euclidean distance.

The objective function of the two steps must agree for the algorithm to always converge.

Recomputing the mean optimizes the sum-of-squares assignment (the mean is the least squares estimator!). Therefore, the distance function must optimize the same objective, unless you also compute the mean differently.

And last but not least, when you are using Gower that somewhat implies that you have categorical attributes. How would you compute a mean/centroid there, in the first place?

Best Answer

Related Solutions

Solved – Perform K-means (or its close kin) clustering with only a distance matrix, not points-by-features data

Solved – Can sub-optimality of various hierarchical clustering methods be assessed or ranked

Related Question