Solved – n intuitive interpretation of mutual information values (bits, nits)

information theoryintuitionmutual information

I understand how mutual information is calculated, and what it is addressing: how much the distribution of one variable changes conditional on the value of another variable. But I don't really understand what the output values of a mutual information calculation actually mean an an absolute sense. I know that 0 means that the variables are independent, and I know I can use those values in relative comparisons for feature selection without really going any deeper than this, but it'd still like to try and understand what the absolute values mean. For example (using python with this MI implementation):

$$X = U(0,1);\ \ n=10000\\
Y = X + U(0, 0.5)\\
MI(X, Y) \approx 0.92
$$

what does it mean that $MI(X,Y)=$0.92 nits?. Is this value actually related to the maximum compressibility of the data, or of the relationship between the variables? Or is it something else entirely?

Best Answer

Let's begin with the definition of entropy

Citing from wikipedia

The entropy rate of a data source means the average number of bits per symbol needed to encode it.

If we use a fair coin, we will need a bit per case in order to store the outcomes. If we use a coin X whose probability of head is 0.999 we can use way less bits

E(X) = -(0.999*log(0.999,2)+0.001*log(0.001,2)) ~ 0.01

You can use techniques like Huffman coding in order to store the outcome efficiently.

Mutual information MI(X,Y) measures how many bits will you need in order to store the outcomes Y given that you know the value of X.

The bits/nits comes from the base of the log used in the entropy and mutual information formulas.

If you use log based 2, you get bits. If you use log based e (ln), you gets nits. Since we store data on computers that use a binary system, bits are the common and more intuitive unit.

Related Question