[Math] Why is the mutual information nonzero for two independent variables

information theory

Suppose we have two independent variables X and Y. Intuitively the mutual information, I(X,Y), between the two should be zero, as knowing one tells us nothing about the other.

The math behind this also checks out from the definition of the mutual information (https://en.wikipedia.org/wiki/Mutual_information).

Now let us compute it actually. First generate two simple random vectors of length 10 in R:

X=sample(seq(1,100),10)
Y=sample(seq(1000,10000),10)

I got these:

X={3, 35, 93, 13, 90, 89, 34, 97, 49, 82}
Y={7611, 5041, 2612, 4273, 6714, 4391, 1000, 6657, 8736, 2443}

The mutual information can be expressed with the entropies H(X), H(Y) and the joint entropy between X and Y, H(X,Y)

I(X,Y) = H(X) + H(Y) - H(X,Y)

Moreover

H(X) = -10*[(1/10)*log(1/10)] = log(10)

since each observation occurs only once and thus has a frequency of 1/10 of occurring. The maximum entropy for a random variable of length N is log(N) so this calculation checks out.

Similarly

H(Y) = log(10)

The joint entropy is similar to the individual entropies but this time we count the frequencies of pairs occurring. For example the pair {X=3,Y=7611} occurs only once out of a total of 10 paired observations, hence it has a frequency of 1/10. Therefore:

H(X,Y) = -10*[(1/10)*log(1/10)] = log(10)

since each paired observation occurs only once.

So

I(X,Y) = log(10) + log(10) - log(10) = log(10)

which is clearly non-zero. This is also the result that various R packages (e.g. infotheo) produce.

The question is where is the mistake in my thinking? Why is I(X,Y) not zero?

Best Answer

I believe you were on the correct path but you did a small mistake while calculating the joint entropy. There will be 100 unique pairs of symbols so the joint entropy will be $\log 100$, that will make the mutual information equal to zero.

Related Question