Solved – Estimating the mutual information for two signal samples

density functionMATLABmutual information

I have two signals s1 and s2, sampled 170 times each.

x = 0.2:0.2:34;
s1 = rand(size(x));
s2 = randn(size(x));

The calculation of the MI (mutual information) between two discrete variables requires knowledge of their marginal probability distribution functions and their joint probability distribution.

I am estimating each signal's marginal distribution using this Kernel Density Estimator.

[~,pdf1,xmesh1,~]=kde(s1);
[~,pdf2,xmesh2,~]=kde(s2);

I am estimating the joint pdf using this 2D Kernel Density Estimator.

[~,pdf_joint,X,Y]=kde2d(data);

I created a function which takes as input the original signals, their marginal pdfs, and their joint pdf, and computes the Mutual Information.

Unfortunately, I seem to have some hug bug in this function, which I can't figure out. The MI should always be a positive number, but I am getting complex and/or negative numbers!

The function I wrote for computing the MI is shown below.

function mi = computeMI(s1, s2, pdf1, xmesh1, pdf2, xmesh2, pdf_joint, X, Y)

N = size(s1, 2);
p_i = zeros(1, N);
p_j = zeros(1, N);

for i=1:N
    p_i(i) = interp1(xmesh1, pdf1, s1(i));
    p_j(i) = interp1(xmesh2, pdf2, s2(i));
end;

mi = 0;

p_ij = zeros(N, N);

for j=1:N
    for i=1:N
        p_ij(i, j) = interp2(X, Y, pdf_joint, s1(i), s2(j));
        delta_mi = p_ij(i,j) * (log2(p_ij(i,j) / (p_i(i) * p_j(j))));
        mi = mi + delta_mi;
    end;
end;

Thank you very much for your help.

Best Answer

I cannot immediately see a bug in your program. However, I see I few things that might alter the outcome of our results.

First of all, I would use $\sum_{ij} p_{ij} \left(\log p_{ij} - \log p_i - \log p_j\right)$ instead of $\sum_{ij} p_{ij} \log \frac{p_{ij}}{p_i\cdot p_j}$ for numerical stability. As a general rule, if you have to multiply probabilities, it is better to work in the log-domain.

Second, since you use different kernel density estimators for the marginals and the joint distribution, they might not be consistent. Let $p(x,y)$ be your joint estimate and $q(x)$ the estimate of one of your marginals. Since $q$ is the marginal of the joint distribution it must hold that $\int p(x,y) dy = p(x) = q(x)$. This is not necessarily the case since you used two different estimators.

Another thing that might get you in trouble is that you interpolate. If you interpolate a density, it will most certainly not integrate to one anymore (that, of course, depends on the type of interpolation you use). This means that you don't even use proper densities anymore.

The easiest solution you could just try out is to use a 2d histogram $H_{ij}$. Let $h_{ij}=H_{ij}/\sum_{ij}H_{ij}$ be the normalized histogram. Then you get the marginals via $h_i = \sum_j h_{ij}$ and $h_j = \sum_i h_{ij}$. The mutual information can then be computed via

lh1 = log(sum(h,1));
lh2 = log(sum(h,2));
I = sum(sum(h .* bsxfun(@minus,bsxfun(@minus,log(h),lh1),lh2) ));

For increasing number of datapoints and smaller bins, this should converge to the correct value. In our case, I would try to use different bin sizes, compute the MI and take a bin size from the region where the value of the MI seems stable. Alternatively, you could use the heuristic by

Scott, D. W. (1979). On optimal and data-based histograms. Biometrika, 66(3), 605-610. doi:10.1093/biomet/66.3.605

to choose the bin size.

If you really want to use kernel density estimators, I would adapt the approach from above and

  1. compute the marginals from the joint estimation
  2. not use interpolation. As I understand, the KDE give you the values of the density at arbitrary base points anyway. Better use those.

One final note, while the MI converges to the true MI with the histogram approach, the entropy does not. So if you want to estimate the differential entropy with a histogram approach, you have to correct for the bin size.