Solved – Gower’s (dis)similarity index

clusteringdistancegower-similaritymetricward

I would like to ask a question about Gower similarity/dissimilarity index.
Is it ok to use the Gower dissimilarity measure with Ward linkage clustering?
I was reading that the Gower similarity index should not be used with Ward linkage because the index is not metric.
I was wondering if this is only the case for the similarity and not for the dissimilarity index that can also handle odrinal variables?!

Best Answer

  1. Gower dissimilarity is just 1 minus Gower similarity, $1-GS$. So, they are "the same", and limitations of one are the limitations of the other.
  2. Ward clustering computes cluster centroids and in order for those to be geometrically "real" it demands (squared) euclidean distances as its input. Euclidean distance is metric. Not every metric distance is euclidean. Thus, not every metric distance is correct for Ward. Still, in practice, metric distances that are not euclidean distance could be used with Ward method heuristically. Non-metric distances - they are not recommended with Ward at all.
  3. By origin, Gower dissimilarity is non-euclidean and non-metric (even when all variables to compute it had been interval, Gower index will be closer to Manhattan distance, not euclidean distance), so you cannot use Ward.
  4. However, geometrically, a concrete matrix of Gower dissimilarity could happen to be close to euclidean distance, and then you may be licensed using Ward (just with these specific data!). To check if a dissimilarity matrix is (close to) euclidean or not, one should double-center it and inspect the eigenvalues of the resultant matrix. The smaller is the sum of negative eigenvalues relative to the sum of positive ones, the closer is the dissimilarities to euclidean distances. But even in this occured case using Ward with Gower distance is purely heuristic.
  5. Gower dissimilarity defined as $\sqrt {1-GS}$ is actually a Euclidean distance (therefore metric, automatically) when no specially processed ordinal variables were used. After double-centering the matrix has no negative eigenvalues (therefore it spans euclidean space with convergence). So, just use this version of the dissimilarity if you want to use methods demanding euclidean space and if taking square root is an acceptable transform for your study settings.