I am calculating an equation A*x=B, where A is a matrix and B is a vector, x is answer (unknown) vector.
Hardware specs: Intel i7 3630QM (4 cores), nVidia GeForce GT 640M (384 CUDA cores)
Here's an example:
>> A=rand(5000);>> B=rand(5000,1);>> Agpu=gpuArray(A);>> Bgpu=gpuArray(B);>> tic;A\B;toc;Elapsed time is 1.382281 seconds.>> tic;Agpu\Bgpu;toc;Elapsed time is 4.775395 seconds.
Somehow GPU is much slower… Why? It is also slower in FFT, INV, LU calculations, which should be related with matrix division.
However, GPU is much faster in matrix multiplication (with the same data):
>> tic;A*B;toc;Elapsed time is 0.014700 seconds.>> tic;Agpu*Bgpu;toc;Elapsed time is 0.000505 seconds.
The main question is why GPU A\B (mldivide) is so slow comparing to CPU?
Best Answer