MATLAB: Slow performance of fftn in the gpu when used inside a loop

fftngpugpuarrayloopperformance

I have just realized that the execution times for fftn operations inside a loop is not proportional to the length of the loop when working inside the GPU.

As an example, if I define a cube in the GPU

a = gpuArray(ones(256,256,256,'single'));

I see that the user time does not scale with the number of visits in a loop. For moderate loops I read:

 >> N=100;tic;for i=1:N;g=fftn(a);end;toc
Elapsed time is 0.008618 seconds.

… but for a loop which is 10 times bigger

>> N=1000;tic;for i=1:N;g=fftn(a);end;toc
Elapsed time is 7.299844 seconds.

the total time does not scale by 10 but by 1000!!!! I know tic/toc is not the best way to measure performance, but it is still the time seen by the users of the program… is there some basical principle of handling gpuArrays inside loops that I am missing?

Best Answer

Various methods on the GPU operate to some extent asynchronously. But there are limits to this - depending on the amount of memory available etc. The best way to time GPU operations is to use gputimeit, like this:

a = gpuArray(ones(256,256,256,'single'));
% Basic case, no looping
t1 = gputimeit(@() fftn(a));
% Looping cases
t100 = gputimeit(@() iLoop(a,100));
t1000 = gputimeit(@() iLoop(a,1000));
% Compare results
disp([t1, t100/100, t1000/1000])
function iLoop(a,N)
for i = 1:N
   fftn(a); 
end
end

On my machine, I see that the results are consistent - i.e. gputimeit does a good job of getting an accurate time even for a single call to fftn. Running the above script, the result I get is:

>> repro
    0.0081    0.0081    0.0081

Related Solutions

MATLAB: Matlab + CUDA slow in solving matrix-vector equation A*x=B

To get accurate timings for GPU calculations you need to be sure to wait for the GPU to finish. You should modify all your timings accordingly:

g = gpuDevice();
tic;
f = fft(A);
wait(g);
toc;

Also, not all GPUs are created equal. To get a sense of what performance you can expect from your GPU in relation to other devices out there, you may want to run gpuBench which can be found here:

MATLAB: GPU enabled functions react slow on first call

MATLAB R2016b is built with CUDA 7.5. The first time you load the GPU libs they have to be just-in-time compiled to match your Pascal-architecture GPU. See the answer here for an explanation and some steps to prevent this compilation happening every time.

Best Answer

Related Solutions

MATLAB: Matlab + CUDA slow in solving matrix-vector equation A*x=B

MATLAB: GPU enabled functions react slow on first call

Related Question