Solved – Why do people use the term “weight of evidence” and how does it differ from “pointwise mutual information”

bayesianmutual informationprobability

Here, "weight of evidence" (WOE) is a common term in the published scientific and policy-making literature, most often seen in the context of risk assessment, defined by:

$$w(e : h) = \log\frac{p(e|h)}{p(e|\overline{h})}$$

where $e$ is evidence, $h$ is hypothesis.

Now, I want to know what is the main difference with PMI (pointwise mutual information)

$$pmi(e,h)=\log\frac{p(e,h)}{p(e)*p(h)}$$

Best Answer

Even though they look similar, they are quite different things. Let's start with the major differences.

  • $h$ is something different in PMI and in WOE
    Notice the term $p(h)$ in PMI. This implies that $h$ is a random variable of which you can compute the probability. For a Bayesian, that's no problem, but if you do not believe that hypotheses can have a probability a priori you cannot even write PMI for hypothesis and evidence. In WOE, $h$ is a parameter of the distribution and the expressions are always defined.

  • PMI is symmetric, WOE is not
    Trivially, $pmi(e,h) = pmi(h,e)$. However, $w(h:e) = \log p(h|e)/p(h|\bar{e})$ need not be defined because of the term $\bar{e}$. Even when it is, it is in general not equal to $w(e:h)$.

Other than that, WOE and PMI have similarities.

The weight of evidence says how much the evidence speaks in favor of a hypothesis. If it is 0, it means that it neither speaks for nor against. The higher it is, the more it validates hypothesis $h$, and the lower it is, the more it validates $\bar{h}$.

Mutual information quantifies how the occurrence of an event ($e$ or $h$) says something about the occurrence of the other event. If it is 0, the events are independent and the occurrence of one says nothing about the other. The higher it is the more often they co-occur, and the lower it is the more they are mutually exclusive.

What about the cases where the hypothesis $h$ is also a random variable and both options are valid? For example in communiction over a binary noisy channel, the hypothesis is $h$ the emitted signal to decode and the evidence is the received signal. Say that the probability of flipping is $1/1000$, so if you receive a $1$, the WOE for $1$ is $\log 0.999/0.001 = 6.90$. The PMI, on the other hand, depends on the proability of emitting a $1$. You can verify that when the probability of emitting a $1$ tends to 0, the PMI tends to $6.90$, while it tends to $0$ when the probability of emitting a $1$ tends to $1$.

This paradoxical behavior illustrates two things:

  1. None of them is suitable to make a guess about the emission. If the probability of emitting a $1$ drops below $1/1000$, the most likely emission is $0$ even when receiving a $1$. However, for small probabilities of emitting a $1$ both WOE and PMI are close to $6.90$.

  2. PMI is a gain of (Shannon's) information over the realization of the hypothesis, if the hypothesis is almost sure, then no information is gained. WOE is an update of our prior odds, which does not depend on the value of those odds.

Related Question