MATLAB: Solving multiple systems of equations using GPU and iterative methods

gpuiterative methodsmultiple linear systemsparallel

I am trying to solve an inverse by solving multiple systems of linear equations. The coefficient matrix is large and sparse, and direct methods produce significant rounding errors. I have used the incomplete LU factorization as preconditioner for the iterative methods, and parallelization using CPU is quite simple as follows:
[L1,U1] = ilu(U,struct('type','ilutp','droptol',tolILU));
E3aux = speye(size(U));
parfor j=1:size(U,1),
[E3(:,j),~] = bigs(U,E3aux(:,j),[],[],L1,U1);
end
However, since bigs function also works with GPUarrays, I wonder if it would be possible to make something similar using the GPU. I know that it is possible to do it for one system of equations as follows:
cpuTime = timeit(@()bicg(U,b,[],[],L1,U1));
fprintf('It takes %3.4f seconds on the CPU to execute a systems of equations.\n',cpuTime);
Ugpu = gpuArray(full(U));
bgpu = gpuArray(full(b));
L1gpu = gpuArray(full(L1));
U1gpu = gpuArray(full(U1));
gpuTime = gputimeit(@()bicg(Ugpu,bgpu,[],[],L1gpu,U1gpu));
fprintf('It takes %3.4f seconds on the GPU to execute system.\n',gpuTime);
fprintf(['Unvectorized GPU code is %3.2f times slower ',...
'than the CPU version.\n'], gpuTime/cpuTime);
Getting the following output:
It takes 0.0195 seconds on the GPU to execute system.
Unvectorized GPU code is 25.29 times slower than the CPU version.
However, I do not know:
  1. How to use preconditioners without transforming them into full matrices.
  2. How to paralellize for multiple independent vectors.

Best Answer

Sparse gpuArrays have been supported since R2015b and bicg since R2016b, so you should just call bicg with the original sparse matrices and you should get good performance. If you pass your system matrix U as a preconditioner then ILU is the technique used to apply it so you will get the same behaviour. You cannot pass two matrix preconditioners to the sparse gpuArray version of bicg.
The iterative solvers cannot (by their very nature) solve for multiple right-hand-sides at the same time. You can only do that by calling the solver repeatedly - or by using a direct solver.
Related Question