Solved – How to calculate mutual information from frequencies

biostatisticscontingency tablesmutual information

Can someone explain to me how to calculate mutual information from contingency table?

I have a contingency table containing counts from a sample of data

Contingency table

And I want to calculate the mutual information between motif and condition.
Since the mutual information formula requires probabilities, how can I estimate it from frequencies? Or how to obtain the mutual information distribution?

Best Answer

I'm not a stats specialist, but I will give it a shot.

First, we can approximate the probability of each event by its empirical probability, i.e. the number of occurrences divided by the total number of trials:

$p(motif_i, condition_j) = \frac{\text{number of occurrences of motif i with condition j}}{ \sum_{i,j} \text{number of occurrences of motif i with condition j}}$

I'll use the shorthands m_1, m_2, c_1, c_2 for motifs and conditions in your table. The approximation gives the following joint distribution $p(m_i,c_j)$:

     c_1  c_2
m_1  0.1 0.05
m_2  0.4 0.45

Marginal probabilities can be computed by just summing rows and columns. Have a look at the example there: https://en.wikipedia.org/wiki/Marginal_distribution I.e. here, $p(m_1)=0.15$ and $p(c_1)=0.5$.

Then, the mutual information can be computed from its definition:

$I(motif;condition) = \sum_{i \in [1,2], j \in [1,2]} p(m_i,c_j)\log(\frac{p(m_i,c_j)}{p(m_i)p(c_j)})$

Related Question