MATLAB: Speeding up code which integrates a 2D gpuArray matrix using Simpson’s Rule

gpuMATLABnumerical integrationoptimizationvectorization

I have a (real) 2D gpuArray, which I am using as part of a larger code, and now am trying to also integrate the array using the Composite Simpson Rule inside my main loop (several 10000 iterations at least). A MWE looks like the following:
%%%%%%%%%%%%%%%%%% MAIN CODE %%%%%%%%%%%%%%%%%%
Ny = 501; % Dimensions of matrix M
Nx = 501; %

dx = 0.1; % Grid spacings
dy = 0.2; %
M = rand(Ny, Nx, 'gpuArray'); % Initialise a matrix
for k = 1:10000
% M = function1(M) % Apply some other functions to M
% ... etc ...
I = simpsons_integration_2D(M, dx, dy, Nx, Ny); % Now integrate M
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%% INTEGRATOR %%%%%%%%%%%%%%%%%%
function I = simpsons_integration_2D(F, dx, dy, Nx, Ny)
% Integrate the 2D function F with Nx columns and Ny rows, and grid spacings
% dx and dy using Simpson's rule.
% Integrate along x direction (vertically) --> IX is a vector afterwards
sX = sum( F(:,1:2:Nx-2) + 4*F(:,2:2:(Nx-1)) + F(:,3:2:Nx) , 2);
IX = dx/3 * sX;
% Integrate along y direction --> I is a scalar afterwards
sY = sum( IX(1:2:Ny-2) + 4*IX(2:2:(Ny-1)) + IX(3:2:Ny) , 1);
I = dy/3 * sY;
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
The operation of performing the integration is around 850 µs, which is currently a significant part of my code. This was measured using
f = @() simpsons_integration_2D(M, dx, dy, Nx, Ny);
t = gputimeit(f)
Is there a way to reduce the execution time for integrating the gpuArray matrix?
(The graphics card is the Nvidia Quadro P4000)

Best Answer

These are faster at least on CPU - I do not have graphics hardware in my laptop which supports GPU arrays:
function I = simpsons_integration_2D_v2(F, dx, dy, Nx, Ny)
% Avoid summation of large intermediate arrays:
sX = F(:,1) + 2 * sum(F(:, 3:2:(Nx-2)), 2) + ...
4 * sum(F(:, 2:2:(Nx-1)), 2) + F(:,Nx);
sY = sX(1) + 2 * sum(sX(3:2:(Ny-2))) + ...
4 * sum(sX(2:2:(Ny-1))) + sX(Ny);
I = dx * dy / 9 * sY;
end
function I = simpsons_integration_2D_v3(F, dx, dy, Nx, Ny)
% Perform summation as dot product:
v = [1, repmat([4, 2], 1, (Nx - 3) / 2), 4, 1];
sX = F * v';
if Nx ~= Ny
v = [1, repmat([4, 2], 1, (Ny - 3) / 2), 4, 1];
end
sY = v * sX;
I = dx * dy / 9 * sY;
end
Compared timeit output on my mobil i5:
0.00111591015 original
0.00066621015 without large intermediate array
0.00012221015 as matrix multiplication
So this is faster on my old 2 core i5 than on your GPU. Maybe this gives a speedup on your GPU also. I assume, in version v3 the vector v must be created on the GPU also.