Python – How to Calculate Mutual Information Using Numpy and Pandas?

information theorymutual informationnumpypandaspython

I am a bit confused. Can someone explain to me how to calculate mutual information between two terms based on a term-document matrix with binary term occurrence as weights?

$$
\begin{matrix}
& 'Why' & 'How' & 'When' & 'Where' \\
Document1 & 1 & 1 & 1 & 1 \\
Document2 & 1 & 0 & 1 & 0 \\
Document3 & 1 & 1 & 1 & 0
\end{matrix}
$$

$$I(X;Y)= \sum_{y \in Y} \sum_{x \in X} p(x,y) \log\left(\frac{p(x,y)}{p(x)p(y)} \right)$$

Thank you

Best Answer

How about forming a joint probability table holding the normalized co-occurences in documents. Then you can obtain joint entropy and marginal entropies using the table. Finally, $$I(X,Y) = H(X)+H(Y)-H(X,Y). $$

Related Question