Solved – Calculating the mutual information between two histograms

histograminformation theoryMATLABmutual information

I've been set a sample exercise by my supervisor, and I'm totally lost as to where I should be heading.

What I've been tasked with is to generate two histograms that approximate Gaussian PDFs. Then, I'm meant to shift the means of the histograms so that they overlap to some extent, and then calculate the drop in mutual information.

I've tried a variety of solutions, but none have given me reliable results so far. What I'm now trying to do is calculate the entropy of each of the histograms, and subtract the entropy of the joint histogram. However, even this is becoming difficult.

Right now, I'm working with the following code;

clear all

%Set the number of points and the number of bins;
points = 1000;
bins = 5;

%Set the probability of each stimulus occurring;
p_s1 = 0.5;
p_s2 = 0.5;

%Set the means and variance of the histogram approximations;
mu1 = 5;
mu2 = 8.2895;

sigma1 = 1;
sigma2 = 1;

%Set up the histograms;
randpoints1 = sigma1.*randn(1, points) + mu1;
randpoints2 = sigma2.*randn(1, points) + mu2;

[co1, ce1] = hist(randpoints1, bins);
[co2, ce2] = hist(randpoints2, bins);

%Determine the marginal histogram;
[hist2D, binC] = hist3([randpoints1', randpoints2'], [bins, bins]);

prob2D = hist2D/points;

r_s1 = sum(prob2D, 2)'; %leftmost histogram
r_s2 = sum(prob2D, 1); %rightmost histogram

%Determine p(r) for each of the marginal histograms;
r1 = p_s1*r_s1;
r2 = p_s2*r_s2;

%Determine the mutual information for each of the marginal histograms;
for ii = 1:bins;
    minf1(ii) = p_s1*r_s1(ii)*log2((r_s1(ii))/(r1(ii)));
    minf2(ii) = p_s2*r_s2(ii)*log2((r_s2(ii))/(r2(ii)));
end

minf1(isnan(minf1)) = 0;
minf2(isnan(minf2)) = 0;

Imax = sum(minf1) + sum(minf2);

From my (albeit limited) understanding of information theory, the above should have calculated the information "contained" within the first and second histograms, and summed them. I would expect a value of 1 for this sum, and I do indeed achieve this value. However, what I'm stuck on now is determining the joint histogram, and following from that, the joint entropy to subtract.

Is the prob2D matrix I've created the joint probability?? If so, how can I use this??
Any insight or links to relevant papers would be much appreciated – I've been googling quite a bit, but I haven't been able to turn up anything of value.

Best Answer

According to wikipedia, mutual information of two random variables may be calculated using the following formula: $$ I(X;Y) = \sum_{y \in Y} \sum_{x \in X} p(x,y) \log{ \left(\frac{p(x,y)}{p(x)\,p(y)} \right) } $$

If I pick up your code from this:

[co1, ce1] = hist(randpoints1, bins); 
[co2, ce2] = hist(randpoints2, bins);

We can solve this the following way:

% calculate each marginal pmf from the histogram bin counts
p1 = co1/sum(co1);
p2 = co2/sum(co2);

% calculate joint pmf assuming independence of variables
p12_indep = bsxfun(@times, p1.', p2);

% sample the joint pmf directly using hist3
p12_joint = hist3([randpoints1', randpoints2'], [bins, bins])/points;

% using the wikipedia formula for mutual information
dI12 = p12_joint.*log(p12_joint./p12_indep); % mutual info at each bin
I12 = nansum(dI12(:)); % sum of all mutual information 

I12 for the random variables that you generate, is quite low (~0.01), which is not surprising, since you generate them independently. Plotting the independence assumed distribution and the joint distribution side by side shows how similar they are:

no mutual information between variables

If, on the other hand, we introduce dependence by generating randpoints2 to have some component of randpoints1, like this for example:

randpoints2 = 0.5*(sigma2.*randn(1, points) + mu2 + randpoints1);

I12 becomes much larger (~0.25) and represents the larger mutual information that these variables now share. Plotting the above distributions again shows a clear (would be clearer with more points and bins of course) difference between joint pmf that assumes independence and a pmf that's generated by sampling the variables simultaneously.

some dependency between variables

The code I used to plot I12:

figure;
subplot(121); pcolor(p12_indep); axis square;
xlabel('Var2'); ylabel('Var1'); title('Independent: P(Var1)*P(Var2)');
subplot(122); pcolor(p12_joint); axis square;
xlabel('Var2'); ylabel('Var1'); title('Joint: P(Var1,Var2)'); 
Related Question