[Math] Algorithm for calculating the mutual information between continuous variables

computational mathematicscomputer scienceinformation theoryprobability

I'm trying to use the notion of mutual information between continuous variables in a software.
https://en.wikipedia.org/wiki/Mutual_information

The inputs of the algorithm would be two lists l1 and l2 of n real numbers and the output would be a real number m that represents the mutual information between the values in l1 and l2.

But I don't know how can I translate that formula in an algorithm. Can you help me?

Best regards.

Best Answer

Estimating mutual information fast and accurately is non-trivial. I recommend using a library or you will have to look at the literature (e.g. see Kraskov et al, 2004, Estimating Mutual Information).


However, you can get an estimate naively as follows. You talk about two lists of values, $X$ and $Y$; hence, you can estimate the discrete rather than continuous mutual information: $$ I(X,Y) = \sum_{x\in X}\sum_{y\in Y} P(x,y) \log\left( \frac{P(x,y)}{P(x)\,P(y)} \right) $$ where $x\in X$ means $x$ runs over the range of $X$. The formula just requires that you know $P(x)$, $P(y)$, and $P(x,y)$, which you can estimate from your data (e.g. kernel density estimation, or fitting a Gaussian mixture model).

Maybe slightly better (and just as easy with libraries) is (1) determine the joint and marginal densities $P(x)$, $P(y)$, and $P(x,y)$ via density estimation (e.g. in Python, see scipy or sklearn) and (2) numerically integrate the resulting density functions (e.g. in scipy) using the continuous formula: $$ I(X,Y) = \int_{Y}\int_{X} P(x,y) \log\left( \frac{P(x,y)}{P(x)\,P(y)} \right)dx\,dy $$ so it boils down to running a density estimation algorithm, followed by a numerical integration algorithm.

You can also check out this question, which computes mutual information by density estimation (using histograms) first and then uses the representation of mutual information via Shannon Entropy, i.e. $$ I(X,Y) = H(X) - H(Y) - H(X,Y) $$ where $H$ is the information entropy, to finish the calculation instead.


(1/11/19) Since this has become of greater importance in certain fields (AI, data science, and machine learning, for example), I'll just mention some of the literature in those areas that uses/requires mutual information estimation.

For instance, infoGAN and HFVAE use Monte Carlo estimates of bounds of the information and/or entropy (also called variational information estimates). See also MINE and Deep InfoMax. Note that these methods (and the methods that cite or are cited by them) are often more concerned with optimizing mutual information, not just estimating it, which of course has different requirements.