MATLAB: Parallel Computing – use 8 cores and 2 video cards

gpuparallel computing

Hi all! I have a Lenovo Y510P laptop with an i7 core and 2 GeForce GT 755M cards and I want to use all 8 cores and the 2 video cards in my MATLAB program.
Currently in my code, I have a parfor loop (with a matlabpool of size 8), which calculate a numerical gpuArray and concatenate the resulting arrays to a master gpuArray of sorts. Within the parfor, my user-defined function initialize and populate the gpuArrays with relevant calculations.
My program goes something like this:
N = 5; %number of variables in my table
n = 1000; %number of calculations to be performed
result_table = gpuArray.zeros(0,N);
parfor i = 1:n
result_subset = myCalcFunc(input1, input2...);
%%concatenating result_subset with master result_table
result_table = [result_table; result_subset];
end
Since the size of my result_subset varies by each iteration, I need a way to keep track of the table height and make sure my result_subset doesn't have any empty rows. The other option would be to initialize an empty matrix and keep resizing it, but then I run into preallocating memory problem.
function myCalcFunc(input1, input2 ...);
result_subset = gpuArray.zeros(n,N);
index_table = 0
for i = 1:n
%%Initial calculation of inputs and convert to gpuArray %%
inputArray1 = gpuArray(Calc1(input1));
inputArray2 = gpuArray(Calc2(input2));
%%Calculations in GPUs, if the results pass my conditions, %%
%%then add this result to the result_subset %%
outputArray = GPU_Calc(inputArray1, inputArray2...);
if fail
continue
end
index_table = index_table + 1;
result_subset(index_table) = outputArray;
end
%delete the remaining empty rows
result_subset = result_subset(1:index_table);
As to why I do my priming of the inputs in CPU, it's because the calculation performs much fast in CPU than in GPU (particularly, the intersect function).
I've been measuring the performance of my CPUs and GPUs using Resource Monitor and MSI Afterburner, and notice that my GPU2 is not used at all. From my research on Loren's blog (<http://blogs.mathworks.com/loren/2013/06/24/running-monte-carlo-simulations-on-multiple-gpus/)>, seems like you should only have one worker per GPU, but what does this mean in terms of CPU cores? (1 worker = 1 core??)
TL,DR: Can I use both of my GPUs at the same time in a matlabpool of size 8, similar to SLI?
Thanks

Best Answer

A couple of questions answered:
  • Your graphics card is optimised for single precision compute. Make sure your data is in single precision rather than double (the default) to get best performance.
  • In a parallel pool one worker is assigned to one physical core. If you want each worker to use more than one core for CPU computation then you need to manually set maxNumCompThreads on each worker. If you have fewer workers than cores (because you're doing multi-GPU for instance) then this may actually give you a benefit for certain kinds of CPU computation, but not always.
  • You can have any number of workers use any number of GPUs. If you want 4 workers, one per physical core, to access one GPU (which sounds like the situation you are in) then just go ahead. However, on Windows there's no way for the GPU to run kernels from each worker simultaneously. Usually this is fine, because your GPU is fully occupied running the computation from any one process and has no space to run kernels for another process. If this isn't the case, then on Linux you can use the Multi Process Service.
Related Question