MATLAB: GPU arrayfun is so slow, what is going on

arrayfun gpu parallel computingParallel Computing Toolbox

Hi,
I am trying to understand what the GPU arrayfun is doing? The following is a test code.
clear;clc;close all
gd=gpuDevice();
reset(gd);
N=2e3;
a=rand(60,N,'single','gpuArray');
tic;
b=sum(a,1);
wait(gd);
toc;
tic;
c=arrayfun(@(i) sum(a(:,i),1),(1:N));
wait(gd);
toc;
The results are:
Elapsed time is 0.000468 seconds.
Elapsed time is 0.584521 seconds.
What is going on here? 1000 times difference?? I would expect similary runtime since GPU arrayfun is supposed to be executed parallel on GPU cores. Did I make stupid errors on using the arrayfun?
Thanks!

Best Answer

What is the most efficient way to vectorize the above code
I would say, as follows,
idx_Neighbor=randi([1 N],60,N,'uint8');
temp=p(idx_Neighbor);
temp=temp+p.';
ax=sum(temp .* delW_x,1);