There are many functions for estimating the mutual information or the entropy in R, for example the entropy package. Enter
install.packages("entropy")
at the R-prompt.
You can then use the property that
$pmi(x;y) = h(x) + h(y) - h(xy) $
to calculate the pointwise mutual information.
You need to obtain frequency estimates for the two random variables first.
I cannot immediately see a bug in your program. However, I see I few things that might alter the outcome of our results.
First of all, I would use $\sum_{ij} p_{ij} \left(\log p_{ij} - \log p_i - \log p_j\right)$ instead of $\sum_{ij} p_{ij} \log \frac{p_{ij}}{p_i\cdot p_j}$ for numerical stability. As a general rule, if you have to multiply probabilities, it is better to work in the log-domain.
Second, since you use different kernel density estimators for the marginals and the joint distribution, they might not be consistent. Let $p(x,y)$ be your joint estimate and $q(x)$ the estimate of one of your marginals. Since $q$ is the marginal of the joint distribution it must hold that $\int p(x,y) dy = p(x) = q(x)$. This is not necessarily the case since you used two different estimators.
Another thing that might get you in trouble is that you interpolate. If you interpolate a density, it will most certainly not integrate to one anymore (that, of course, depends on the type of interpolation you use). This means that you don't even use proper densities anymore.
The easiest solution you could just try out is to use a 2d histogram $H_{ij}$. Let $h_{ij}=H_{ij}/\sum_{ij}H_{ij}$ be the normalized histogram. Then you get the marginals via $h_i = \sum_j h_{ij}$ and $h_j = \sum_i h_{ij}$. The mutual information can then be computed via
lh1 = log(sum(h,1));
lh2 = log(sum(h,2));
I = sum(sum(h .* bsxfun(@minus,bsxfun(@minus,log(h),lh1),lh2) ));
For increasing number of datapoints and smaller bins, this should converge to the correct value. In our case, I would try to use different bin sizes, compute the MI and take a bin size from the region where the value of the MI seems stable. Alternatively, you could use the heuristic by
Scott, D. W. (1979). On optimal and data-based histograms. Biometrika, 66(3), 605-610. doi:10.1093/biomet/66.3.605
to choose the bin size.
If you really want to use kernel density estimators, I would adapt the approach from above and
- compute the marginals from the joint estimation
- not use interpolation. As I understand, the KDE give you the values of the density at arbitrary base points anyway. Better use those.
One final note, while the MI converges to the true MI with the histogram approach, the entropy does not. So if you want to estimate the differential entropy with a histogram approach, you have to correct for the bin size.
Best Answer
I believe you don't need to discretize your
vit
variable because it is already discrete. You might also want to usemi.plugin
to calculate mutual information instead: