MATLAB: Iterative solver with gpuArray

anonymous functiongmresgpugpuarrayiterativeParallel Computing Toolbox

Hi all,
In some cases the use of iterative solvers is useful also with full matrices, which is my case. I would like to use an iterative solver like GMRES with full matrices where the matrix and the RHS are gpuArrays, but it looks like this is not provided with Matlab 2013a.
My data are
>> n = 1024;
>> Acpu = rand(n)+100*eye(n);
>> bcpu = rand(n,1);
>> Agpu = gpuArray(Acpu); bgpu = gpuArray(bcpu);
I tried either
>> x = gmres(Agpu,bgpu,[]);
Error using iterchk (line 39)
Argument must be a floating point matrix or a function handle.
Error in gmres (line 86)
[atype,afun,afcnstr] = iterchk(A);
and
>> x = gmres(@(x)(Agpu*x),bgpu,[]);
The following error occurred converting from gpuArray to double:
Conversion to double from gpuArray is not possible
Error in gmres (line 297)
U(:,1) = u;
The only way I found to make it work is
>> x = gmres(@(x)gather(Agpu*x),bcpu,[]);
gmres converged at iteration 7 to a solution with relative residual 2.4e-07.
That is terribly ugly because the matrix-vector-product is continuously swapped from GPU to the system memory. Any suggestion to use GMRES on GPU using MATLAB built-in functions?
Thanks in advance Fabio

Best Answer

Even for much larger problem sizes (n=10240) and a not so new graphics card (GTX 580), I see negligible overhead in time to swap between CPU and GPU,
n = 1024*10;
Acpu = rand(n)+100*eye(n);
bcpu = rand(n,1);
Agpu = gpuArray(Acpu);
bgpu= gpuArray(bcpu);
gputimeit(@() Agpu*bgpu) %all data on gpu
%0.0052sec
gputimeit(@() gather( Agpu*bcpu )) %requires data transfer
%0.0054sec
Speed-up in GMRES also seems pretty good (factor of 4)
tic;
x = gmres(@(x) Acpu*x,bcpu,[]);
toc
%Elapsed time is 0.391786 seconds.
tic;
x = gmres(@(x)gather(Agpu*x),bcpu,[]);
toc
%Elapsed time is 0.097924 seconds.