Solved – How does the log(p(x,y)) normalize the point-wise mutual information

entropyinformation theorymutual information

I'm trying to understand the normalized form of pointwise mutual information.

$npmi = \frac{pmi(x,y)}{log(p(x,y))}$

Why does the log joint probability normalize the pointwise mutual information to be between [-1, 1]?

The point-wise mutual information is:

$pmi = log(\frac{p(x,y)}{p(x)p(y)})$

p(x,y) is bounded by [0, 1] so log(p(x,y)) is bounded by (,0]. It seems like the log(p(x,y)) should somehow balance changes in the numerator, but I don't understand exactly how. It also reminds me of entropy $h=-log(p(x))$, but again I don't understand the exact relationship.

Best Answer

From Wikipedia entry on pointwise mutual information:

Pointwise mutual information can be normalized between [-1,+1] resulting in -1 (in the limit) for never occurring together, 0 for independence, and +1 for complete co-occurrence.

Why does it happen? Well, the definition for pointwise mutual information is

$$ pmi \equiv \log \left[ \frac{p(x,y)}{p(x)p(y)} \right] = \log p(x,y) - \log p(x) - \log p(y), $$

whereas for normalized pointwise mutual information is:

$$ npmi \equiv \frac{pmi}{-\log p(x,y)} = \frac{\log[ p(x) p(y)]}{\log p(x,y)} - 1. $$

The when there are:

no co-occurrences, $\log p(x,y)\to -\infty$, so nmpi is -1,
co-occurrences at random, $\log p(x,y)= \log[p(x) p(y)]$, so nmpi is 0,
complete co-occurrences, $\log p(x,y)= \log p(x) = \log p(y)$, so nmpi is 1.

Best Answer

Related Solutions

Solved – Why do people use the term “weight of evidence” and how does it differ from “pointwise mutual information”

Information Theory – Why Perfectly Similar Data Have Zero Mutual Information?

Related Question