So I'm in the process of shifting some code over from CPU to GPU, and I ran into a weird issue where a loop on the GPU was running extremely slow. I have provided a simplified code snippet that captures my issue:
tic;d=gpuArray(0);n=gpuArray(100);P=gpuArray(rand(3, 100));x = toc;fprintf("Allocation time is %f\n", x);tic;for j = 1:n for k = 1:n d = d + (n^-2)*norm(P(:,j) - P(:,k)); endendx=toc;fprintf("Loop time is %f\n", x);
Allocation time is 0.007388
Loop time is 5.119713
…I'm a little confused. This loop is taking 5 seconds on the GPU, but if I run it on the CPU it takes 0.03 seconds.
Any thoughts? All of my data is gpuArray(), and norm() is a gpu-compatible built-in.
Thanks.
Best Answer