Clustering in Python – Adjusted Rand Index vs Adjusted Mutual Information Explained

clusteringpythonscikit learn

I am trying to evaluate clustering performance. I was reading the skiscit-learn documentation on metrics. I do not understand the difference between ARI and AMI. It seems to me that they do the same thing in two different ways.

Citing from the documentation:

Given the knowledge of the ground truth class assignments labels_true and our clustering algorithm assignments of the same samples labels_pred, the adjusted Rand index is a function that measures the similarity of the two assignments, ignoring permutations and with chance normalization.

vs

Given the knowledge of the ground truth class assignments labels_true and our clustering algorithm assignments of the same samples labels_pred, the Mutual Information is a function that measures the agreement of the two assignments, ignoring permutations … AMI was proposed more recently and is normalized against chance.

Should I use both of them in my clustering evaluation or would this be redundant?

Best Answer

Short answer

  • Use ARI when the ground truth clustering has large equal sized clusters
  • Use AMI when the ground truth clustering is unbalanced and there exist small clusters

Longer answer

I worked on this topic. Reference: Adjusting for Chance Clustering Comparison Measures

A one-line summary of the paper is: AMI is high when there are pure clusters in the clustering solution.

Let's have a look at an example. We have a reference clustering V consisting of 4 equal size clusters. Each cluster is of size 25. Then we have two clustering solutions:

  • U1 that has pure clusters (many zeros in the contingency table)
  • U2 that has impure clusters

enter image description here

AMI will choose U1 and ARI will choose U2.

Eventually:

  • U1 is unbalanced. Unbalanced clusters have more chances to present pure clusters. AMI is biased towards unbalanced clustering solutions
  • U2 is balanced. ARI is biased towards balanced clustering solutions.

If we are using external validity indices such as AMI and ARI, we are aiming at matching the reference clustering with our clustering solution. This is why the recommendation at the top: AMI when the reference clustering is unbalanced, and ARI when the reference clustering is balanced. We do this mainly due to the biases in both measures.

Also, when we have an unbalanced reference clustering with small clusters, we are even more interested in generating pure small clusters in the solution. We want to identify precisely the small clusters from the reference. Even a single mismatched data point can have a relatively higher impact.

Other than the recommendations above, we could use AMI when we are interested in having pure clusters in the solution.

Experiment

Here I sketched an experiment where P generates solutions U which are balanced when P=1 and unbalanced when P=0. You can play with the notebook here.

enter image description here

Related Question