MATLAB: Parallel programming with createJob

computingcreatejobcreatetaskparallelParallel Computing Toolboxparforspmd

Hi. I'm having some confusion regarding the functions createJob and createTask.
I have a function, compute(a,b), that gives back the number of primes between a and b. It doesn't use the function isprime(x). Instead it uses a cycle to check if each number between a and b is prime or not.
Now let's say my goal was to find the number of primes between a=1 and b=30. I could simply write NumPrimes = compute(1,30); and I would get the desired value. I wanted to make the script run faster tho, so I decided to parallelize (is that a word?) the code.
I had three workers available, so what I did was separate the numbers between 1 and 30 in three groups: from 1 to 10, 11 to 20 and 21 to 30.
(In my actual script I obviously used larger numbers, and I made sure that the time it took for a lab to execute each group of numbers was similar)
Simply using a parfor / spmd function, assigning each group to a different lab, cuts the time it takes for the execution to run. I wanted to do it in a different way tho. That's when I started having trouble with my code.
I decided to create a job that I'd submit with three different tasks, each one taking care of a different group of numbers. The tasks were supposed to be ran in parallel and once the job was complete I'd retrieve the outputs of each task. Here's the code:
matlabpool open 3
job = createJob('configuration','local','FileDependencies',{'compute.m'});
createTask(job,@compute,1,{1,10})
createTask(job,@compute,1,{11,20})
createTask(job,@compute,1,{21,30})
submit(job)
wait(job)
out = getAllOutputArguments(job);
This didn't run as expected. It took about 3 times longer than when I simply run the code in series NumPrimes = compute(1,30); which is the opposite of what I desired…
I think I'm not understanding how the createJob and createTask functions work. I thought that when one creates three tasks and runs the job they would run in parallel in different workers, cutting the time of execution in comparison to a serial execution. That can't be it tho. The job doesn't seem to be running in series also, since the time increases by a factor of 3…
I'm really confused. If anyone could point me in the right direction I'd appreciate.
Thanks. Daniel

Best Answer

Hi Daniel,
The problem is in the first line of code where you open a matlabpool. This is not required when using jobs and tasks, and will in fact use up resources on your system that will not then be available for use by the submitted job. My guess is that you have a 4-core machine, the matlabpool is using 3 of those cores, which leaves 1 left over for the job to run on, and as you have 3 tasks running on one core it is taking 3 times longer.
Everything else you have done is fine, just remove the first line that opens the matlabpool and try again.
Cheers, Tom