Hi,
I am using the parallel computing toolbox, and it seems like copying data to a gpu using gpuArray() generally takes much longer than copying data back to the host using gather(). For example, if I try:
A = rand(500,500, 50); tic, B=gpuArray(A); toc, tic, gather(B); toc,
Then the gather() takes about 0.055 seconds while the gpuArray() takes only 0.018 seconds. Is this behavior expected? Am I using the wrong method to time this?
Thanks.
Best Answer