MATLAB: Speed up for loop in this code for calculating mutual information (maybe using GPU computing)

entropyfor loopgpumutual informationparallel computingperformance

So I want to use the code given HERE where we use the Kraskov estimation procedure to estimate the mutual information between two time series (for more information see Ref: Kraskov, Alexander, Harald Stögbauer, and Peter Grassberger. "Estimating mutual information." Physical review E 69.6 (2004): 066138).

Whilst the code seems to work fine for my uses, because of the length of time series and the amount of different time series (I am calculating mutual information between pairs of many different time series) I have, the code runs too slowly. I ran the code profiler in MATLAB and seems to be the case that the following section code in the function provided in the link is causing the slowdown:

% compute distance between each sample and its k-th nearest neighbour
dz = zeros(nObs, nObs);
dx = zeros(nObs, nObs);
dy = zeros(nObs, nObs);
for i = 1:nObs
  for j = 1:nObs
      dx(i,j) = sqrt(sum((X(i, :) - X(j, :)).^2));
      dy(i,j) = sqrt(sum((Y(i, :) - Y(j, :)).^2));
      dz(i,j) = max([dx(i, j), dy(i, j)]);
  end
end

Is there any way to speed this up? I was thinking maybe a GPU based total/partial solution might be feasible/offer a sufficient speed up. Any alternative suggestions would be very helpful (maybe parallelsing and using a parfor loop instead, although the speed up would be less and would make my future projects more complicated). I am currently using MATLAB 2016b.

Best Answer

For a 1000x1000 matrix, this is 6 times faster already:

% Version 1:
n  = size(X, 1);
X = X.';
Y = Y.';
dx = zeros(n, n);
dy = zeros(n, n);
for j = 1:n
   Xj = X(:, j);
   Yj = Y(:, j);
   for i = j+1:n
      dx(i,j) = sqrt(sum(bsxfun(@minus, X(:, i), Xj) .^ 2));
      dy(i,j) = sqrt(sum(bsxfun(@minus, Y(:, i), Yj) .^ 2));
      dx(j,i) = dx(i,j);
      dy(j,i) = dy(i,j);
   end
end
dz = max(dx, dy);

The original function took 29.5 sec (R2016b, Core2Duo, Win7/64), and the cleaned version 5.2 sec.

Here the data are processed columnwise, which is much faster because neighboring elements are accessed much faster in the memory. Then the comparison my max() is done outside the loop. And finally the resulting matrix is symmetric and you can omit the computation of X(:,i) and X(:,j) if you have the results for X(:,j) and X(:,i) already.

I tried to vectorized the inner loop:

% Version 2:
n  = size(X, 1);
X = X.';
Y = Y.';
dx = zeros(n, n);
dy = zeros(n, n);
for j = 1:n
   dx(j+1:n,j) = sqrt(sum(bsxfun(@minus, X(:, j+1:n), X(:, j)) .^ 2, 1));
   dy(j+1:n,j) = sqrt(sum(bsxfun(@minus, Y(:, j+1:n), Y(:, j)) .^ 2, 1));
   dx(j,j+1:n) = dx(j+1:n,j);
   dy(j,j+1:n) = dy(j+1:n,j);
end
dz = max(dx, dy);

But this takes 21 sec for 1000x1000 arrays. But for smaller 100x100 inputs it is faster: 1.2 sec instead of 2.2 sec (100 iterations).

Now you have an efficient function to start a parallelization or computation on the GPU. Maybe this is useful (I cannot test it):

% Version 3:
parfor v = 1:2
  if v == 1
    for j = 1:n
      dx(j+1:n, j) = sqrt(sum((X(:, j+1:n) - X(:, j)) .^ 2, 1));
      dx(j, j+1:n) = dx(j+1:n, j);
    end
  else
    for j = 1:n
      dy(j+1:n, j) = sqrt(sum((Y(:, j+1:n) - Y(:, j)) .^ 2, 1));
      dy(j, j+1:n) = dy(j+1:n, j);
    end
  end
end

But parfor for the inner loop will use more cores.

Related Solutions

MATLAB: Correct use of arrayfun

Well first of all, arrayfun should be a last resort. There are much more optimal ways of doing the mean subtraction you describe as well as many other operations like it. In R2016b or higher,

a-mean(a,1),

else if older than R2016b,

 bsxfun(@minus, a, mean(a,1))

But, if you really must use arrayfun, it can be done, for example, as follows

cell2mat( arrayfun(@(i) a(:,i)-mean(a(:,i)),  1:size(a,2), 'uni',0) )

MATLAB: How to reduce number of loops

Ok, well, you could adapt the following to your case then. I am not checking in-depth that it is working though, and there is room for improvement as I didn't optimize indexing for managing properly boundaries..

 n    = 100 ;
 sig  = 20 * rand(n,n,n) ; 
 LX   = size(sig,1) ;
 LY   = size(sig,2) ;
 LZ   = size(sig,3) ;
 dx   = 0.1 + rand(LX, 1) ; % Made irregular for testing..
 dy   = 0.8 + zeros(LY, 1) ;
 dz   = 0.3 + zeros(LZ, 1) ;
 Ctop = zeros(size(sig)) ;
 % - Your code.
 tic ;
 for i = 2:LX-1
    for j = 2:LY-1
        for k = 2:LZ-1
      Ctop(i,j,k) = (-1/dz(k-1))*((sig(i-1,j  ,k-1)*((dx(i-1)*dy(j  ))/4))+...
                                  (sig(i  ,j  ,k-1)*((dx(i  )*dy(j  ))/4))+...
                                  (sig(i-1,j-1,k-1)*((dx(i-1)*dy(j-1))/4))+...
                                  (sig(i  ,j-1,k-1)*((dx(i  )*dy(j-1))/4)));
       end
    end
 end
 toc
 % - Meshgrid based code.
 Ctop_mesh = zeros(size(Ctop)) ;
 tic ;
 [dy_m0, dx_m0] = meshgrid(dy(2:end-1), dx(2:end-1)) ;
 [dy_m1, dx_m1] = meshgrid(dy(1:end-2), dx(1:end-2)) ;
 sig_m00 = sig(2:end-1,2:end-1,:) ;
 sig_m10 = sig(1:end-2,2:end-1,:) ;
 sig_m01 = sig(2:end-1,1:end-2,:) ;
 sig_m11 = sig(1:end-2,1:end-2,:) ;
 for k = 2 : LZ-1
      Ctop_mesh(2:end-1,2:end-1,k) = (-1/dz(k-1)) / 4 * ...
          (sig_m10(:,:,k-1) .* dx_m1 .* dy_m0 + ...
           sig_m00(:,:,k-1) .* dx_m0 .* dy_m0 +...
           sig_m11(:,:,k-1) .* dx_m1 .* dy_m1 +...
           sig_m01(:,:,k-1) .* dx_m0 .* dy_m1) ;
 end
 toc
 % - Check.
 tmp = abs(Ctop_mesh(2:end-1,2:end-1,2:end-1)-Ctop(2:end-1,2:end-1,2:end-1)) ;
 all(tmp(:) < 1e-12)

This code outputs:

 Elapsed time is 6.281855 seconds.          % 3 nested loops
 Elapsed time is 0.408983 seconds.          % meshgrid
 ans =                                      % check
     1

Note that it is possible to get rid of the last FOR loop and work with a 3D array for dz built this way:

 dz_m1 = permute(repmat(dz(1:end-1),[1,LX-1,LY-1]), [2,3,1]) ;

but I am not sure that it is more efficient (especially if you want to PARFOR the remaining loop).

Best Answer

Related Solutions

MATLAB: Correct use of arrayfun

MATLAB: How to reduce number of loops

Related Question