Solved – Estimating Mutual Information using R

estimationmutual informationr

I am trying to estimate the mutual information between vit level (values vary from 4 to 70, all of which are whole numbers) and a binary variable that indicates the presence of polyps. I am not sure if I should first discretize the vit variable. If I do, I am not sure how many bins to choose. Any pointers would be appreciated.

When I draw box-plots, I can see there is some association but my code estimates mutual information of 0.006.

This is my R code at the moment:

freqs2d = rbind( vit, polyps)
H1 = entropy.plugin(rowSums(freqs2d))
H2 = entropy.plugin(colSums(freqs2d))
H12 = entropy.plugin(freqs2d)
H1+H2-H12

Best Answer

I believe you don't need to discretize your vit variable because it is already discrete. You might also want to use mi.plugin to calculate mutual information instead:

library(entropy)

set.seed(2017)  # For reproducibility

# 100 observations of a discrete variable between 4 and 70
vit = as.integer(runif(n = 100, min = 4, max = 70))

# 100 binary observations
polyps = rbinom(n = 100, size = 1, prob = 0.5)

# MI "by hand"
freqs2d = rbind( vit, polyps)
H1 = entropy.plugin(rowSums(freqs2d))
H2 = entropy.plugin(colSums(freqs2d))
H12 = entropy.plugin(freqs2d)
H1+H2-H12 # outputs 0.01076491

# Calculate mutual information
mi.plugin(freqs2d)  # outputs 0.01076491
Related Question