Solved – Evaluation measures of overlapping clustering

clusteringmeasurementvalidity

I have a dataset of Facebook users and a set of different clustering algorithms. The project goal is to draw up a rank between these algorithms in order to understand which of them are the good ones. The only information I have is the users dataset and nothing else: no ground-truth, no class / truth labels or whatever. Moreover, each algorithm is designed to create an overlapping clustering, i.e., a given user can belong to more than one cluster.

Since the ground truth is not available, I must use internal indices to evaluate the clustering quality. According to this post: Evaluation measure of clustering (without having truth labels), we can find a lot of indices of this kind in the literature (see this paper for more details). The problem is: all these indices are not useful when the algorithms return an overlapping clustering.

Do you know some indices to evaluate the quality of overlapping clustering?

Best Answer

Have a look at:

Vendramin, L., Campello, R. J., & Hruschka, E. R. (2010).
Relative clustering validity criteria: A comparative overview.
Statistical Analysis and Data Mining, 3(4), 209-235.

and follow-up work by the same authors. I believe I saw a table there that conained a "feature table" for the various indexes in literature, in particular whether they support overlapping clusters. But I could not find the table (and exact citation) right now.

Related Question