Solved – Estimating Mutual Information using R

estimationmutual informationr

I am trying to estimate the mutual information between vit level (values vary from 4 to 70, all of which are whole numbers) and a binary variable that indicates the presence of polyps. I am not sure if I should first discretize the vit variable. If I do, I am not sure how many bins to choose. Any pointers would be appreciated.

When I draw box-plots, I can see there is some association but my code estimates mutual information of 0.006.

This is my R code at the moment:

freqs2d = rbind( vit, polyps)
H1 = entropy.plugin(rowSums(freqs2d))
H2 = entropy.plugin(colSums(freqs2d))
H12 = entropy.plugin(freqs2d)
H1+H2-H12

Best Answer

I believe you don't need to discretize your vit variable because it is already discrete. You might also want to use mi.plugin to calculate mutual information instead:

library(entropy)

set.seed(2017)  # For reproducibility

# 100 observations of a discrete variable between 4 and 70
vit = as.integer(runif(n = 100, min = 4, max = 70))

# 100 binary observations
polyps = rbinom(n = 100, size = 1, prob = 0.5)

# MI "by hand"
freqs2d = rbind( vit, polyps)
H1 = entropy.plugin(rowSums(freqs2d))
H2 = entropy.plugin(colSums(freqs2d))
H12 = entropy.plugin(freqs2d)
H1+H2-H12 # outputs 0.01076491

# Calculate mutual information
mi.plugin(freqs2d)  # outputs 0.01076491

Related Solutions

Solved – Pointwise mutual information for text using R

There are many functions for estimating the mutual information or the entropy in R, for example the entropy package. Enter

install.packages("entropy")

at the R-prompt. You can then use the property that $pmi(x;y) = h(x) + h(y) - h(xy) $ to calculate the pointwise mutual information. You need to obtain frequency estimates for the two random variables first.

Solved – Estimating the mutual information for two signal samples

I cannot immediately see a bug in your program. However, I see I few things that might alter the outcome of our results.

First of all, I would use $\sum_{ij} p_{ij} \left(\log p_{ij} - \log p_i - \log p_j\right)$ instead of $\sum_{ij} p_{ij} \log \frac{p_{ij}}{p_i\cdot p_j}$ for numerical stability. As a general rule, if you have to multiply probabilities, it is better to work in the log-domain.

Second, since you use different kernel density estimators for the marginals and the joint distribution, they might not be consistent. Let $p(x,y)$ be your joint estimate and $q(x)$ the estimate of one of your marginals. Since $q$ is the marginal of the joint distribution it must hold that $\int p(x,y) dy = p(x) = q(x)$. This is not necessarily the case since you used two different estimators.

Another thing that might get you in trouble is that you interpolate. If you interpolate a density, it will most certainly not integrate to one anymore (that, of course, depends on the type of interpolation you use). This means that you don't even use proper densities anymore.

The easiest solution you could just try out is to use a 2d histogram $H_{ij}$. Let $h_{ij}=H_{ij}/\sum_{ij}H_{ij}$ be the normalized histogram. Then you get the marginals via $h_i = \sum_j h_{ij}$ and $h_j = \sum_i h_{ij}$. The mutual information can then be computed via

lh1 = log(sum(h,1));
lh2 = log(sum(h,2));
I = sum(sum(h .* bsxfun(@minus,bsxfun(@minus,log(h),lh1),lh2) ));

For increasing number of datapoints and smaller bins, this should converge to the correct value. In our case, I would try to use different bin sizes, compute the MI and take a bin size from the region where the value of the MI seems stable. Alternatively, you could use the heuristic by

Scott, D. W. (1979). On optimal and data-based histograms. Biometrika, 66(3), 605-610. doi:10.1093/biomet/66.3.605

to choose the bin size.

If you really want to use kernel density estimators, I would adapt the approach from above and

compute the marginals from the joint estimation
not use interpolation. As I understand, the KDE give you the values of the density at arbitrary base points anyway. Better use those.

One final note, while the MI converges to the true MI with the histogram approach, the entropy does not. So if you want to estimate the differential entropy with a histogram approach, you have to correct for the bin size.

Best Answer

Related Solutions

Solved – Pointwise mutual information for text using R

Solved – Estimating the mutual information for two signal samples

Related Question