Solved – How to compare the following mutual information values

clusteringmutual information

How can I compare the following mutual information values ? I'm just wondering what's the most appropriate way to display them in my report table.

I'm computing them with this formula

where e and c are clusters and the intersection is the number of elements in common.

For each couple e and c I have a I value (mutual information). Successively I average over all e belonging to the same category (not shown in the formula) and I end up with a table like:

cat1 0.0123
cat2 0.0012
cat3 0.0009
cat4 0.0100

The mutual dependency values are usually very low (around 0.01), because n (total amount of documents in the collection) is very high.

Should I use another measure, or… what do you suggest ?


Best Answer

Are you after the mutual information between two clusterings? Marina Meila has introduced the 'variation of information' metric based on mutual information (see e.g. That would be quite appropriate to use. She also discusses alternative metric distances between clusterings. One of these (the split/join distance) is a bit more easily interpretable as the number of nodes that need rearranging between clusterings.

Alternatively, if you are not after a clustering-clustering comparison but more interested in individual events, you may consider using the hypergeometric P-value to consider the significance of intersection sizes between sets.

Related Question