Can someone explain to me how to calculate mutual information from contingency table?
I have a contingency table containing counts from a sample of data
And I want to calculate the mutual information between motif and condition.
Since the mutual information formula requires probabilities, how can I estimate it from frequencies? Or how to obtain the mutual information distribution?
Best Answer
I'm not a stats specialist, but I will give it a shot.
First, we can approximate the probability of each event by its empirical probability, i.e. the number of occurrences divided by the total number of trials:
$p(motif_i, condition_j) = \frac{\text{number of occurrences of motif i with condition j}}{ \sum_{i,j} \text{number of occurrences of motif i with condition j}}$
I'll use the shorthands m_1, m_2, c_1, c_2 for motifs and conditions in your table. The approximation gives the following joint distribution $p(m_i,c_j)$:
Marginal probabilities can be computed by just summing rows and columns. Have a look at the example there: https://en.wikipedia.org/wiki/Marginal_distribution I.e. here, $p(m_1)=0.15$ and $p(c_1)=0.5$.
Then, the mutual information can be computed from its definition:
$I(motif;condition) = \sum_{i \in [1,2], j \in [1,2]} p(m_i,c_j)\log(\frac{p(m_i,c_j)}{p(m_i)p(c_j)})$