Only single-linkage is optimal. Complete-linkage fails, so your intuition was wrong there. ;-)
As a simple example, consider the one-dimensional data set 1,2,3,4,5,6.
The (unique) optimum solution for complete linkage with two clusters is (1,2,3) and (4,5,6) (complete linkage height 2). The (unique) optimum solution with three clusters is (1,2), (3,4), (5,6), at linkage height 1.
There exists no agglomerative hierarchy that contains both, obviously, because these partitions do not nest. Hence, no hierarchical clustering contains all optimal solutions for complete linkage. I would even expect complete linkage to exhibit among the worst gap between the optimum solution and the agglomerative solution at high levels, based on that example... An this example likely is a counter example for most other schemes (except single linkage, where this data set degenerates).
You may be interested in studying this article:
Gunnar E. Carlsson, Facundo Mémoli:
Characterization, Stability and Convergence of Hierarchical Clustering Methods. Journal of Machine Learning Research 11: 1425-1470 (2010)
and the references contained therein, in particular also:
Kleinberg, Jon M. "An impossibility theorem for clustering." Advances in neural information processing systems. 2003.
These seem to indicate that given some mild stability requirements (which can be interpreted as uncertainty of our distance measurements), only single-linkage clustering remains. Which, unfortunately, is all but usable in most cases... so IMHO for practical purposes the authors overshoot a bit, neglecting usefulness of the result over pretty theory.
k-means clearly does not make sense, because the mean does not make sense on this kind of data.
However, don't skip the preprocessing. Preprocessing is 90% of your work. E.g. you can probably just drop any unique tag. You can use stemming to merge variations of the same word etc. But in general classic text mining techniques such as TF-IDF should be worth a try.
Now that you can't use k-means doesn't mean that all other clustering algorithms suffer from the same issue. First of all, you can try e.g. DBSCAN and OPTICS.
But maybe you aren't actually looking for clustering algorithms. Have you considered looking at groups defined implicitely e.g. by frequent itemsets? Have you looked at classic document clustering techniques? And how about using the various clustering algorithms for categorical data? There is no reason to use a vector space oriented method such as k-means on a data which naturally is not vectorized. Search on Scholar for "clustering categorical data".
Best Answer
Using K-means after hierarchical clustering or hierarchical clustering after K-means may be sometimes a sound trick on its own - not because of weighting.
Frequency weighting of objects when clustering objects
Now about weighting. To do hierarchical cluster analysis of cases with frequency weights attached to the cases (objects to cluster):
Approach 1, general. Propagate objects. Multiply the weights by a constant so that the smaller individual weight becomes about 1, and then round the weights; and propagate cases according to those frequencies. For example, if you have 4 groups of cases with corresponding case weights
0.55 0.23 1.98 1.14
, multiplying by 4.35 yields2.39 1.00 8.61 4.96
and then rounding to frequencies2 1 9 5
. Propagate each case of the corresponding group this number of times. In SPSS a syntax to propagate cases is as follows.If you need 10 times greater precision in compliance to the original fractional weights, multiply by 10 before rounding,
23.9 10.0 86.1 49.6
so that the frequencies of propagation will be24 10 86 50
. However, duplicating cases these big number of times may make the dataset too big for a single hierarchical cluster analysis. So don't be too hard with precision.On the other hand, propagated big dataset you can cluster-analyze by randomly selected subsamples - several times. Then you could combine the results (several approaches are possible). [Actually, to perform such resampling with replacement in SPSS you don't need to propagate the data first. However, I will stop and won't go in details of syntax to do it.]
After propagation of cases and before clustering, you may want to add tiny random noise to quantitative features - to untie identical cases. It will make results of clustering less dependent on the order of cases in the dataset.
If you are working with already built distance matrix (rather than the dataset) then propagate its rows/columns times you need. In SPSS, you may use handy matrix function
!propag()
for that - see my web-page.Approach 2. Use resumed agglomeration. Some implementations of hierarchical clustering (an example is my own SPSS macro for hierarchical clustering found on my web-page) allow to interrupt agglomeration and save the currently left distance matrix; that matrix has additional column with within-cluster frequencies so far. The matrix can be used as input to "resume" the clustering. Now, the fact is that some methods of agglomeration, namely, single (nearest neighbour), complete (farthest neighbour), between-group average (UPGMA), centroid and median, do not notice or make difference about what is the within-cluster density when they merge two clusters. Therefore, for these methods resuming agglomeration is equivalent to doing agglomeration with initial frequency weights attached. So, if your program has the option to interrupt/resume agglomeration you may use it, under the above methods, to "simulate" weighted input succesfully, and you don't need to propagate rows/columns of the matrix.
Moreover, three methods - single, complete and median (WPGMC) are known to ignore even the within-cluster frequencies when they merge two clusters. Therefore frequency weighting (either by approach 1 or approach 2) for these methods appear needless altogether. They are insensitive to it and will give the same classification of objects without weightind as with weighting. The only difference will be in the dendrogram looks because with weighting you use more objects to combine and it should show up on the dendro.
As for weighting cases in K-means clustering procedure, SPSS allows it: the procedure obeys weighting regime. This is understandable: K-means computation can easily and naturally incorporate integer or fractional weights while computing cluster means. Propagation of cases should give very similar results to clustering under weighting switched on.
Two-step cluster analysis of SPSS doesn't support weighting cases, like hierarchical clustering. So the solution here is the propagation of cases described above in approach 1.