MATLAB: Matlab + CUDA slow in solving matrix-vector equation A*x=B

cudagpuMATLABmatrixslowvector

I am calculating an equation A*x=B, where A is a matrix and B is a vector, x is answer (unknown) vector.

Hardware specs: Intel i7 3630QM (4 cores), nVidia GeForce GT 640M (384 CUDA cores)

Here's an example:

>> A=rand(5000);
>> B=rand(5000,1);
>> Agpu=gpuArray(A);
>> Bgpu=gpuArray(B);
>> tic;A\B;toc;
Elapsed time is 1.382281 seconds.
>> tic;Agpu\Bgpu;toc;
Elapsed time is 4.775395 seconds.

Somehow GPU is much slower… Why? It is also slower in FFT, INV, LU calculations, which should be related with matrix division.

However, GPU is much faster in matrix multiplication (with the same data):

>> tic;A*B;toc;
Elapsed time is 0.014700 seconds.
>> tic;Agpu*Bgpu;toc;
Elapsed time is 0.000505 seconds.

The main question is why GPU A\B (mldivide) is so slow comparing to CPU?

Best Answer

To get accurate timings for GPU calculations you need to be sure to wait for the GPU to finish. You should modify all your timings accordingly:

g = gpuDevice();
tic;
f = fft(A);
wait(g);
toc;

Also, not all GPUs are created equal. To get a sense of what performance you can expect from your GPU in relation to other devices out there, you may want to run gpuBench which can be found here:

Related Solutions

MATLAB: Gpu computation is slow sometimes

Some operations using gpuArray don't complete synchronously, so using tic and toc can give you misleading timings. You should use gputimeit to time gpuArray operations to get reliable timings.

MATLAB: Iterative solver with gpuArray

Even for much larger problem sizes (n=10240) and a not so new graphics card (GTX 580), I see negligible overhead in time to swap between CPU and GPU,

   n = 1024*10; 
    Acpu = rand(n)+100*eye(n);
    bcpu = rand(n,1); 
    Agpu = gpuArray(Acpu); 
    bgpu= gpuArray(bcpu);
    gputimeit(@() Agpu*bgpu) %all data on gpu
    %0.0052sec
    gputimeit(@() gather( Agpu*bcpu )) %requires data transfer
    %0.0054sec

Speed-up in GMRES also seems pretty good (factor of 4)

   tic;
    x = gmres(@(x) Acpu*x,bcpu,[]);
   toc
   %Elapsed time is 0.391786 seconds.
   tic;
    x = gmres(@(x)gather(Agpu*x),bcpu,[]);
   toc
   %Elapsed time is 0.097924 seconds.

Best Answer

Related Solutions

MATLAB: Gpu computation is slow sometimes

MATLAB: Iterative solver with gpuArray

Related Question