Solved – Using Ward’s method for clustering and Dice’s similarity coefficient for binary data

binary dataclusteringhierarchical clusteringward

I am trying to isolate the most similar groups from a set of binary variables while minimizing variation within the clusters.

Is it valid to use Ward's method for clustering and measure similarly by Dice's coefficient for binary data? If anyone happens to have any examples of published data where this method as been used I would be extremely grateful.

Best Answer

Ward's method is about minimizing squared deviations, similar to k-means. Squared deviations are related to squared Euclidean distance.

To better understand the relationship, have a look at the underlying math and variance. Variance, by its concept, is meant for continuous variables, not for binary data (where the mean is not a meaningful data point anymore).

This implies it should not be used with other distances, unless you can prove them to be equivalent to squared Euclidean distance in some kernel space, or maybe a Bregman divergence, etc.

For a distance (or similarity) of binary variables such as Dice or Hamming or Jaccard, other linkages such as single-linkage and complete-linkage are more meaningful and interpretable. Precisely, single linkage will still mean that every point in the cluster is connected by steps of at most this distance, and in complete linkage every point in a cluster is connected to every other point in the cluster with at most this distance.