MATLAB: GPU utilization and parallel computation With Matlab for heavy computation

gpumonte carloParallel Computing Toolbox

I have decent/ok machine with core i7 (8 cores), 32G of RAM and Nvidia geForce GTX 1080i and running Matlab 2018b. At the moment I am a bit confuse about how to use these resources in best way to run my Monte-Carlo simulation code. The two questions I have now:
1- How can I make all the heavy computaion to be run on the GPU alonside parallel compution capability of Matlab rather than the CPU and hence I can decide what is best to use? I have read different help topics and the conclusion I think I have got is, the data I have to work with should be in the form of gpuArray Am I right? or do I miss something here?. let us assume that I have the foollowing simple code to be run on GPU :
First_Vector=zeros(2,3);
% First_Vector=zeros(2,3,'gpuArray'); 1
[N,M]=size(First_Vector);
%[N,M]=size(First_Vector,'gpuArray'); 2
Second_Matrix=ones(N,M,2);
%Second_Matrix=ones(N,M,2,'gpuArray'); 3
Tset1= [20 20 20:30 30 30];
%Tset1gpuArray=gpuArray(Tset1); 4
Test2= [50 50 50;60 60 60];
%Tset2=gpuArray(Tset2); 5
K=100;
% the main code
for i=1:3
for j=1:3
[element]=Function1(test(i,j),K)
Test1(i,j)=element;
end
end
Second_Matrix(:,:,1)=Test1;
[Test1]=Function2(Test1,Test2);
% End of the main code
%% Function 1
function[outcome]=Function1(A,K)
outcome=A+K;
end
%%Function 2
function[T1]=Function2(T1,T2)
T1=T1+T2;
end
does the commented lines (1-5) are enough to run the 'main code' on the GPU?
2- I have tested the following simple code on GPU and CPU, CPU performance was by far better than GPU. is that supposed to be normal ?
thanks in advanced.
G = ones(10,10,'gpuArray');
tic
for k=1:100
for i=1: 1000
for j=1:10
G(j,:)=G(j,:)+2;
end
end
end
toc
G = ones(10,10);
tic
for k=1:100
for i=1: 1000
for j=1:10
G(j,:)=G(j,:)+2;
end
end
end
toc
% Elapsed time is 0.628241 seconds.

Best Answer

I'll try to answer your questions in order...
  1. Yes! Isn't that great?
  2. Yes, because there are two problems with your code: (a) you're using a lot of for loops instead vector operations and (b) you're measuring GPU performance incorrectly. To fix (a), you should read this doc page that explains how to vectorize your code to get the best performance. To fix (b), you should take a look at my answer to a previous question and use the functions timeit and gputimeit.