The actual overhead from invoking a parfor is pretty low (~17 ms), so it is fast enough to start up a parfor, do one operation per worker, and then repeat:
for i=1:N parfor j=1:10 X{j} = gpuArray(X{j}); Y{j} = MyFunction(X{j}); % <-- Takes about 1 second per worker
endend
However, it seems that Matlab re-copies all of the data in X{j} over to the GPU each iteration of the for loop. I would like X{j} to persist on the GPU between for loop iterations.
One hacky solution is to embed another for loop inside the parfor to reduce the amount of re-copying. This is not ideal for my application (I'm doing gradient descent function optimization).
Hopefully there is a simple way to force each X{j} to remain on its respective GPU.
Best Answer