I am trying to solve an inverse by solving multiple systems of linear equations. The coefficient matrix is large and sparse, and direct methods produce significant rounding errors. I have used the incomplete LU factorization as preconditioner for the iterative methods, and parallelization using CPU is quite simple as follows:
[L1,U1] = ilu(U,struct('type','ilutp','droptol',tolILU)); E3aux = speye(size(U)); parfor j=1:size(U,1), [E3(:,j),~] = bigs(U,E3aux(:,j),[],[],L1,U1); end
However, since bigs function also works with GPUarrays, I wonder if it would be possible to make something similar using the GPU. I know that it is possible to do it for one system of equations as follows:
cpuTime = timeit(@()bicg(U,b,[],[],L1,U1));fprintf('It takes %3.4f seconds on the CPU to execute a systems of equations.\n',cpuTime);Ugpu = gpuArray(full(U));bgpu = gpuArray(full(b));L1gpu = gpuArray(full(L1));U1gpu = gpuArray(full(U1));gpuTime = gputimeit(@()bicg(Ugpu,bgpu,[],[],L1gpu,U1gpu));fprintf('It takes %3.4f seconds on the GPU to execute system.\n',gpuTime);fprintf(['Unvectorized GPU code is %3.2f times slower ',... 'than the CPU version.\n'], gpuTime/cpuTime);
Getting the following output:
It takes 0.0195 seconds on the GPU to execute system.Unvectorized GPU code is 25.29 times slower than the CPU version.
However, I do not know:
- How to use preconditioners without transforming them into full matrices.
- How to paralellize for multiple independent vectors.
Best Answer