I am trying to evaluate clustering performance. I was reading the skiscit-learn documentation on metrics. I do not understand the difference between ARI and AMI. It seems to me that they do the same thing in two different ways.
Citing from the documentation:
Given the knowledge of the ground truth class assignments labels_true and our clustering algorithm assignments of the same samples labels_pred, the adjusted Rand index is a function that measures the similarity of the two assignments, ignoring permutations and with chance normalization.
vs
Given the knowledge of the ground truth class assignments labels_true and our clustering algorithm assignments of the same samples labels_pred, the Mutual Information is a function that measures the agreement of the two assignments, ignoring permutations … AMI was proposed more recently and is normalized against chance.
Should I use both of them in my clustering evaluation or would this be redundant?
Best Answer
Short answer
Longer answer
I worked on this topic. Reference: Adjusting for Chance Clustering Comparison Measures
A one-line summary of the paper is: AMI is high when there are pure clusters in the clustering solution.
Let's have a look at an example. We have a reference clustering V consisting of 4 equal size clusters. Each cluster is of size 25. Then we have two clustering solutions:
AMI will choose U1 and ARI will choose U2.
Eventually:
If we are using external validity indices such as AMI and ARI, we are aiming at matching the reference clustering with our clustering solution. This is why the recommendation at the top: AMI when the reference clustering is unbalanced, and ARI when the reference clustering is balanced. We do this mainly due to the biases in both measures.
Also, when we have an unbalanced reference clustering with small clusters, we are even more interested in generating pure small clusters in the solution. We want to identify precisely the small clusters from the reference. Even a single mismatched data point can have a relatively higher impact.
Other than the recommendations above, we could use AMI when we are interested in having pure clusters in the solution.
Experiment
Here I sketched an experiment where P generates solutions U which are balanced when P=1 and unbalanced when P=0. You can play with the notebook here.