Solved – The most popular hierarchical clustering algorithm (divisive scheme)

clusteringhierarchical clustering

My question: what is a "standard divisive hierarchical clustering algorithm".

I have a well-defined similarity matrix, and have already carried out a clustering (with spectral + genetic clustering algorithms), but it's quite complicated.

I would like to show that a run-of-the-mill divisive hierarchical clustering algorithm gives worse results (I have means of saying which results are better).

What's important: it MUST be (for reasons too political to explain) a divisive hierarchical algorithm, and it MUST use a similarity matrix (and not, for example, a distance matrix).

I would really appreciate any advice.

Best Answer

There are not many divisive hierarchical clusterings that I know of. In fact, I know exactly one such algorithm: DIANA (DIvisive ANAlysis or so) and I would not call it "popular", but exotic and only of historical interest. A divisive scheme needs to find the best of O(2^n) possible splits - this is very expensive, and even heuristics don't help that much to get a good result. Top-down isn't the method of choice.

Agglomerative methods are much more popular, but still scale badly, O(n^2) or worse (the standard HAC is O(n^3) runtime, O(n^2) memory). In many cases any O(n^2) method (in particular any that needs a full distance or similarity matrix) will be unacceptably expensive, which is why people keep on using k-means.