MATLAB: Does the GPU load increase over time

matlab gpu memory load effective computational rate

Here's my code:
% for j=1:4
tic;
reset(gpuDevice(1)); clear all; % clean up
format long; % show double precision
R_i_gpu=gpuArray(6); % initial radius
dL_gpu=gpuArray(1.e-5); % delta length
n_gpu=R_i_gpu/dL_gpu; % calculate the number of steps or intervals
theta_i_gpu=dL_gpu/R_i_gpu; % calculate the initial theta
R_final_gpu=gpuArray(2); % final radius
dR_gpu=R_final_gpu-R_i_gpu; % calculate delta radius
d_theta_gpu=gpuArray(3*pi/2); % angle that the radius varies over (i.e. (R_final-R_initial)/d_theta))
dR_d_theta_gpu=dR_gpu/d_theta_gpu; % calculate the rate of change of radius with respect to theta
% Ri=R_i-(dR_d_theta*theta_i)
% thetai=dL/Ri
Ri_gpu=R_i_gpu; % initialise radius
thetai_gpu=theta_i_gpu; % initialise theta
n=gather(n_gpu);
itime=toc
tic;
for i=1:n-1; % for loop
Ri_gpu=Ri_gpu+(dR_d_theta_gpu*thetai_gpu); % update radius
thetai_gpu=dL_gpu/Ri_gpu; % update theta
R_gpu(i)=Ri_gpu; % put the radius into a column array
theta_gpu(i)=thetai_gpu; % put the theta into a column array
% A_gpu=[R_gpu ;theta_gpu]'; % create the radius/theta array
end
rtime=toc
%%tic
% R_gpu=[R_i_gpu R_gpu]'; % horizontally concatenate the initial radius with the radius array that's calculated for each of the interval steps
theta_gpu=[theta_i_gpu theta_gpu]'; % hcat the initial theta with the theta array
theta_sum_gpu=sum(theta_gpu); % sum the theta (in radians)
theta_sum_deg_gpu=theta_sum_gpu*360/(2*pi); % convert the theta sum to degrees
% ptime=toc
% end
I'm running this on a 3930K with a nVidia GTX 660 Superclocked and I noticed that as my dL_gpu goes from 1.e-4 to 1.e-5, the effective computational rate decreased. So, I started using GPU-Z to monitor the memory usage and the GPU load and the GPU memory controller load and found that both the GPU load and GPU memory controller load increases as time goes on and now I am trying to figure out why it is doing that?
Should it be that if you're solving a 1-D integration of a Newton's approximate-like solution that for each step/iteration, the time required is the same?
I'm trying to understand how MATLAB builds arrays for A(i)=B. Does it rebuild the entire array at each iteration or does it just add the latest entry to the bottom of the list?
And if that is the case, then why is the memory controller load going up (also as a function of time)?
Any assistance that can try and help me understand what's going on behind the scenes would be greatly appreciated! Thank you!

Best Answer

A(i) = B without pre-allocation will add data to the end of your array until it runs out of space, then it will allocate more space, copy the existing array across, and continue. This takes a lot of time. This is where all your little spikes come from. The larger the array, the longer the resize operation takes: when you have smaller deltas your array is getting longer and longer and so everything is running slower and slower.
Note that what you are doing here is not appropriate for GPU computation. The GPU is useful for operating in parallel on large arrays of data. You are not doing anything in parallel here, so the GPU is mostly idle and you've wasted a lot of time sending data over to it.
Another tip: don't put scalar data onto the GPU (e.g. R_i_gpu=gpuArray(6)). Only put arrays on the GPU. GPU code will automatically bring scalars across to the GPU if and when it is necessary.
Related Question