The function integral2 is not supported for gpuArray data. You could implement your own version using supported functionality, such as via trapz:
function Z = trapz2(func, xmin, xmax, ymin, ymax, N)
xvals = gpuArray.linspace(xmin, xmax, N);
yvals = gpuArray.linspace(ymin, ymax, N);
[X, Y] = meshgrid(xvals, yvals);
xspacing = (xmax - xmin)/N;
yspacing = (ymax - ymin)/N;
F = arrayfun(func, X, Y);
Z1 = trapz(F) * yspacing;
Z = trapz(Z1) * xspacing;
You really shouldn't be putting all those scalar values on the GPU. Your large arrays should be on the GPU, but you should allow the GPU operations to move scalar data onto the GPU only when needed.
Try this version:
function tetha = double_integral(clz)
if nargin == 0
clz = 'double';
end
B = 1;
Z = single(0:1:14);
R = single(-3:1:3);
phi = 5;
Fo = 1;
H1 = 2;
H2 = 12;
R_len = uint8(length(R));
Z_len = uint8(length(Z));
tetha = zeros(Z_len,R_len,clz);
function z = myfunc(phi_1, Fo_1)
z = (1./(Fo-Fo_1).^(3./2)).*exp(-(R(p).^2+1)./(4*(Fo-Fo_1))).*exp((2.*R(p).*cos(phi-phi_1))./(4*(Fo-Fo_1))).*(exp(-(Z(k)-B.*phi_1./(2*pi)).^2./(4*(Fo-Fo_1)))-exp(-(Z(k)+B*phi_1./(2*pi)).^2./(4*(Fo-Fo_1))));
end
for k = 1:Z_len
for p = 1:R_len
if strcmp(clz, 'gpuArray')
A = trapz2(@myfunc, 2*pi*H1/B, 2*pi*H2/B, 0, Fo-0.02, 1000);
else
A = integral2(@myfunc, 2*pi*H1/B, 2*pi*H2/B, 0, Fo-0.02);
end
tetha(k,p) = B./(16.*pi.^(5/2)).*A;
end
end
end
I got this result:
>> timeit(@()double_integral('double'))
ans =
14.9615
>> gputimeit(@()double_integral('gpuArray'))
ans =
1.1724
So a 14x speed-up or thereabouts.
This version is data-parallel across evaluations of the function. That may not be the most efficient approach. You may want to be data parallel across Z_len and R_len. You would have to write an iterative version of the integration that operates on scalars and then you can call that on a large Z_len x R_len gpuArray matrix of inputs using arrayfun.
Best Answer